This document consolidates identified vulnerabilities from multiple reports into a single list, removing duplicates and providing detailed descriptions, impacts, mitigations, and testing procedures for each.
-
Description:
- An attacker can inject malicious formulas into CSV data served by Django REST Pandas when using Excel renderers (PandasExcelRenderer, PandasOldExcelRenderer). When a user opens the exported Excel file, these injected formulas can be executed by Excel, potentially leading to arbitrary command execution.
- Step-by-step trigger:
- An attacker identifies an API endpoint in a Django REST Pandas application that exports data in Excel format (e.g.,
/api/data.xlsx
). - The attacker crafts a request to this endpoint such that the data returned by the API, and subsequently included in the exported Excel file, contains a CSV injection payload. This could be achieved by manipulating input parameters that influence the data being processed by the API. For example, if the API endpoint displays data based on a search query, the attacker could include the payload in the search query. A common CSV injection payload for Excel is
=cmd|' /C calc'!A0
which attempts to execute the calculator application. - The server processes the request and generates an Excel file containing the injected payload.
- The attacker tricks a user into downloading and opening the malicious Excel file.
- When the user opens the Excel file, Excel interprets the injected string as a formula and executes it. In the example payload
=cmd|' /C calc'!A0
, this would lead to the execution of thecalc
command, opening the calculator application on the user's system. More dangerous commands could also be injected.
- An attacker identifies an API endpoint in a Django REST Pandas application that exports data in Excel format (e.g.,
-
Impact:
- Arbitrary command execution on the victim's machine when they open the exported Excel file.
- Depending on the injected formula, this could lead to:
- Information disclosure: attacker could potentially read local files or system information.
- Data exfiltration: attacker could send sensitive data to an external server.
- System compromise: in more advanced scenarios, attacker might be able to gain persistent access to the user's system.
-
Vulnerability Rank: High
-
Currently implemented mitigations:
- None. The Django REST Pandas project itself does not implement any sanitization or encoding of data to prevent CSV injection in Excel renderers. The data from the Django application is directly passed to the pandas
to_excel
function, which includes it in the Excel file without any built-in protection against formula injection.
- None. The Django REST Pandas project itself does not implement any sanitization or encoding of data to prevent CSV injection in Excel renderers. The data from the Django application is directly passed to the pandas
-
Missing mitigations:
- Input sanitization: Implement input validation and sanitization to prevent users from injecting special characters or formula prefixes (like
=, @, +, -
) that can be interpreted as formulas by spreadsheet applications. This sanitization should be applied to any user-controlled data that ends up in the exported Excel file. - Contextual encoding: Pandas
to_excel
function offers options for string escaping, but these are not utilized by default in Django REST Pandas. Explore using these options to properly encode data being written to Excel files to prevent formula injection. For instance, prepending a single quote ('
) to strings starting with formula injection characters can prevent them from being interpreted as formulas. - Documentation: Clearly document the potential CSV injection vulnerability in the context of Excel exports and advise developers on how to sanitize data before serving it through Django REST Pandas, especially when using Excel renderers.
- Input sanitization: Implement input validation and sanitization to prevent users from injecting special characters or formula prefixes (like
-
Preconditions:
- The application must use Django REST Pandas to serve data in Excel format (using
PandasExcelRenderer
orPandasOldExcelRenderer
). - User-controlled data must be included in the exported Excel file without proper sanitization.
- The attacker needs to trick a user into downloading and opening the exported Excel file.
- The application must use Django REST Pandas to serve data in Excel format (using
-
Source code analysis:
- File:
/code/rest_pandas/renderers.py
- Class
PandasFileRenderer
and its subclassesPandasExcelRenderer
andPandasOldExcelRenderer
are responsible for rendering data in Excel formats. - The
render_dataframe
method inPandasBaseRenderer
(parent ofPandasFileRenderer
) callsfunction = getattr(data, name)
wherename
is 'to_excel' anddata
is the pandas DataFrame. Then, it executes this function:function(*args, **kwargs)
. - The
get_pandas_args
method inPandasFileRenderer
returns a list containing the filename:return [self.filename]
. - The
get_pandas_kwargs
method inPandasBaseRenderer
returns an empty dictionary by default:return {}
. - This means that
dataframe.to_excel(filename)
is called with minimal control over the output format, and without any explicit sanitization of the DataFrame content before writing to the Excel file. - The pandas
to_excel()
function, by default, does not sanitize data against formula injection. Therefore, if the DataFrame contains strings starting with characters like '=', '@', '+', or '-', Excel and other spreadsheet software may interpret them as formulas, leading to CSV injection.
# Vulnerable code snippet from /code/rest_pandas/renderers.py (simplified) class PandasBaseRenderer(BaseRenderer): def render_dataframe(self, data, name, *args, **kwargs): function = getattr(data, name) # name is 'to_excel' function(*args, **kwargs) # Calls dataframe.to_excel(filename) without sanitization class PandasFileRenderer(PandasBaseRenderer): def get_pandas_args(self, data): return [self.filename] # filename created using mkstemp
- File:
-
Security test case:
- Step 1: Setup a Django REST Pandas view that serves data in Excel format and includes user-controlled input. For example, modify
tests/testapp/views.py
to create a new view that takes a 'injection' GET parameter and includes it in the DataFrame.
# Add to tests/testapp/views.py class ExcelInjectionView(PandasSimpleView): def get_data(self, request, *args, **kwargs): injection = request.GET.get('injection', '') data = [{'value': injection}] return data
- Step 2: Add a corresponding URL pattern in
tests/testapp/urls.py
.
# Add to tests/testapp/urls.py path("excel_injection", ExcelInjectionView.as_view()),
- Step 3: Run the Django development server.
- Step 4: Craft a malicious URL that includes a CSV injection payload in the
injection
parameter. For example:http://127.0.0.1:8000/excel_injection.xlsx?injection==cmd|' /C calc'!A0
. - Step 5: Access the URL in a browser to download the
excel_injection.xlsx
file. - Step 6: Open the downloaded
excel_injection.xlsx
file using Microsoft Excel or LibreOffice Calc. - Step 7: Observe that upon opening the file, the calculator application is launched (or a similar command execution occurs depending on the payload and the spreadsheet software). This confirms the CSV injection vulnerability.
- Step 1: Setup a Django REST Pandas view that serves data in Excel format and includes user-controlled input. For example, modify
-
Description:
- An attacker who can control the value stored in the Station model’s “code” field may supply a malicious value. In the method
load_weather
(located intests/weather/models.py
), the URL for downloading weather data is generated by string substitution without any sanitization. This allows an attacker to make the server send requests to arbitrary URLs. - Step-by-step trigger:
- An attacker identifies a way to control the
Station.code
field, for instance through a user-facing form or API endpoint that allows creating or updating Station records. - The attacker crafts a malicious "code" value. This value could be an internal IP address, a hostname of an internal service, or a URL pointing to an attacker-controlled server.
- The attacker triggers the
load_weather
method, either directly or indirectly through application functionality that calls this method. - The server, when executing
load_weather
, will construct a URL using the attacker-controlled "code" and make an HTTP request to that URL.
- An attacker identifies a way to control the
- An attacker who can control the value stored in the Station model’s “code” field may supply a malicious value. In the method
-
Impact:
- If exploited, the server will send HTTP requests to arbitrary destinations. This may allow an attacker to:
- Probe internal network resources (bypassing firewall restrictions).
- Exfiltrate sensitive information from internal endpoints.
- Possibly leverage the server as a proxy in further attacks.
- If exploited, the server will send HTTP requests to arbitrary destinations. This may allow an attacker to:
-
Vulnerability Rank: High
-
Currently Implemented Mitigations:
- None. The code simply uses string formatting to build the URL without any validation or sanitation of the
code
field.
- None. The code simply uses string formatting to build the URL without any validation or sanitation of the
-
Missing Mitigations:
- Validate and restrict the allowed values for the station “code” (for example by using a whitelist or a fixed regular expression).
- Use URL-parsing or encoding libraries to ensure that any injected special characters are neutralized.
- Add request timeouts and network egress filtering to prevent abuse.
- Consider preventing external input from reaching this processing function in a public API.
-
Preconditions:
- The attacker must be able to control or set the
Station.code
field. This may occur if the application exposes the station-creating/updating functionality to untrusted users or if an administrator fails to validate input.
- The attacker must be able to control or set the
-
Source Code Analysis:
- File:
tests/weather/models.py
- The constant
DATA_URL = "https://www.ncei.noaa.gov/access/past-weather/{code}/data.csv"
is defined. - The method
load_weather
callsresponse = requests.get(DATA_URL.format(code=self.code))
with no sanitation or verification ofself.code
. - The fetched CSV data is parsed and inserted into the weather database.
# Vulnerable code snippet from tests/weather/models.py (simplified) DATA_URL = "https://www.ncei.noaa.gov/access/past-weather/{code}/data.csv" def load_weather(self): response = requests.get(DATA_URL.format(code=self.code)) # No sanitization of self.code # ... process response ...
- File:
-
Security test case:
- Step 1: Set up a test instance using the same code (or a similar derived application) that exposes the Station creation endpoint.
- Step 2: Create (or update) a Station record with a malicious “code” value (for example, a value containing a domain name/IP address known to be internal or under attacker control, or with additional CRLF characters to attempt URL manipulation). For example, set
code
tohttp://localhost:8000/internal-admin-page
if such page exists. - Step 3: Trigger the
load_weather
method (either via an API call or by directly calling the method in a test). - Step 4: Monitor outbound HTTP requests from the server. If the request is made to the attacker-specified endpoint (or for unexpected destinations), the vulnerability is confirmed. In the example of
http://localhost:8000/internal-admin-page
, you can check server logs to see if a request to/internal-admin-page
was made from within the server.
-
Description:
- When a pandas-based view is rendered, the helper method
get_pandas_headers
in thePandasMixin
(located inrest_pandas/views.py
) constructs a “Content-Disposition” header by directly embedding the filename obtained fromget_pandas_filename
. Ifget_pandas_filename
returns a string containing newline characters or CRLF sequences, an attacker can inject arbitrary HTTP headers. - Step-by-step trigger:
- An attacker identifies a way to influence the filename returned by the
get_pandas_filename
method. This could be through URL parameters, database values, or any other input that a developer might use to dynamically generate filenames. - The attacker crafts a malicious filename string that includes CRLF sequences followed by headers they wish to inject. For example:
report.csv\r\nInjected-Header: malicious-value
. - The attacker makes a request to the vulnerable endpoint, ensuring that their malicious filename (or the input that leads to it) is used by the application.
- The server processes the request, and the
get_pandas_headers
method constructs theContent-Disposition
header using the malicious filename. - The server sends the HTTP response. Due to the CRLF injection in the filename, the attacker's injected headers are also included in the response.
- An attacker identifies a way to influence the filename returned by the
- When a pandas-based view is rendered, the helper method
-
Impact:
- The attacker could use HTTP header injection (or response splitting) to:
- Manipulate HTTP responses.
- Poison caches.
- In some cases combine with subsequent XSS attacks to inject scripts (though less directly via Content-Disposition).
- The attacker could use HTTP header injection (or response splitting) to:
-
Vulnerability Rank: High
-
Currently Implemented Mitigations:
- The default implementations provided by sample views return fixed, hardcoded filenames. However, there is no sanitation at the library level in
get_pandas_headers
to ensure that malicious characters are stripped from the filename.
- The default implementations provided by sample views return fixed, hardcoded filenames. However, there is no sanitation at the library level in
-
Missing Mitigations:
- Sanitize the filename by stripping any newline (
\n
/\r
) characters or other control sequences before including it in the header. - Use defensive coding or built-in libraries (or even Django’s own utilities) to safely quote header values.
- Optionally ignore or override any user-supplied filename if it is not from a trusted source.
- Sanitize the filename by stripping any newline (
-
Preconditions:
- An attacker must be able to influence the return value of
get_pandas_filename
. This will be possible if the application (or a developer’s override) uses unsanitized external input (such as GET parameters or database values) to form the filename.
- An attacker must be able to influence the return value of
-
Source Code Analysis:
- File:
rest_pandas/views.py
- In the base class
PandasMixin
, the methodget_pandas_headers
is defined. - It directly uses the return from
get_pandas_filename
(which is not sanitized in the project code) to construct the header via Python’s string formatting:'attachment; filename="{}"'.format(filename)
. - No extra checks (such as filtering CR or LF characters) are performed.
# Vulnerable code snippet from rest_pandas/views.py (simplified) class PandasMixin: def get_pandas_headers(self, request): filename = self.get_pandas_filename(request, format) if filename: return { "Content-Disposition": 'attachment; filename="{}"'.format(filename) # No filename sanitization } return {}
- File:
-
Security test case:
- Step 1: Create or override a view so that its
get_pandas_filename
method returns a string containing CRLF sequences (for example:filename = 'report.csv\r\nInjected: malicious-header: evil'
). - Step 2: Invoke the vulnerable endpoint with an HTTP client (e.g.,
curl -v http://127.0.0.1:8000/your-view
). - Step 3: Examine the raw HTTP response headers to see if extra headers are injected. In the example filename, you should see an additional header
Injected-Header: malicious-value
in the HTTP response headers. - Step 4: If the header is split or additional header content appears, the vulnerability is confirmed.
- Step 1: Create or override a view so that its
Vulnerability: Lack of Integrity Verification for External JavaScript Dependencies in GitHub Pages Workflow
-
Description:
- The project’s GitHub Pages build workflow (in
.github/workflows/pages.yml
) downloads several JavaScript files directly from unpkg.com usingcurl
without any integrity verification. This means if unpkg.com or the delivery path is compromised, malicious JavaScript code could be injected into the project's documentation site. - Step-by-step trigger:
- An attacker compromises the unpkg.com CDN, or performs a man-in-the-middle attack during the download process.
- When the GitHub Pages workflow runs, the
curl
commands fetch the attacker's malicious JavaScript code instead of the legitimate libraries. - The workflow proceeds to build and deploy the documentation site using the compromised JavaScript files.
- Users visiting the GitHub Pages documentation site will execute the malicious JavaScript code in their browsers.
- The project’s GitHub Pages build workflow (in
-
Impact:
- An attacker who can compromise the external host could modify the JavaScript files. This could lead to:
- Injection of malicious code into the client-side assets.
- Cross-site scripting (XSS) attacks on users who visit the GitHub Pages site.
- Broad supply-chain compromise of the web asset.
- An attacker who can compromise the external host could modify the JavaScript files. This could lead to:
-
Vulnerability Rank: Critical
-
Currently Implemented Mitigations:
- None. The workflow downloads the scripts over HTTPS but does not perform any additional integrity checks (such as verifying expected checksums or using hard-coded SRI hashes).
-
Missing Mitigations:
- Pin the external dependencies to explicit versions and verify them using SHA256 or SRI checksums. For example, instead of
@latest
or@next
, use specific versions like@wq/[email protected]
and obtain SRI hashes for these versions from reputable sources. - Implement a post-download checksum comparison in the workflow before committing the files to the site.
- Alternatively, vendor and maintain the JavaScript dependencies in a secure repository, committing them directly into the project instead of downloading them during build time.
- Pin the external dependencies to explicit versions and verify them using SHA256 or SRI checksums. For example, instead of
-
Preconditions:
- An attacker must be able to compromise the unpkg.com-hosted versions of the dependencies or intercept the connection even over HTTPS (for example, by exploiting weaknesses in the CDN or through DNS hijacking).
-
Source Code Analysis:
- File:
.github/workflows/pages.yml
- The workflow file downloads JS files using
curl
with no integrity validation:
curl -L -s https://unpkg.com/wq > docs/js/wq.js curl -L -s https://unpkg.com/@wq/markdown@latest > docs/js/markdown.js curl -L -s https://unpkg.com/@wq/analyst@next > docs/js/analyst.js curl -L -s https://unpkg.com/@wq/chart@next > docs/js/chart.js
- Subsequent
sed
commands rewrite module import paths but do not alter the contents or perform any security checks. - The downloaded files are placed under
docs/js/
and are then served to visitors.
- File:
-
Security test case:
- Step 1: In a controlled test environment, create a simple HTTP server that mimics unpkg.com and serves a malicious JavaScript file when requested for one of the dependencies (e.g.,
wq.js
). - Step 2: Modify the
.github/workflows/pages.yml
file to point thecurl
command to your malicious server instead of unpkg.com. - Step 3: Run the Pages workflow locally or simulate its steps so that the modified file is "downloaded" from your server.
- Step 4: Visit the generated documentation site (served locally or from the workflow output) and check whether the malicious JavaScript executes (for example, by triggering an alert or logging a known token to the console).
- Step 5: Validate that without an integrity check, the site's asset pipeline accepts altered remote files. To further confirm, revert the workflow and manually replace the downloaded
wq.js
indocs/js/
with your malicious version and check if it executes when you opendocs/index.html
.
- Step 1: In a controlled test environment, create a simple HTTP server that mimics unpkg.com and serves a malicious JavaScript file when requested for one of the dependencies (e.g.,