-
-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(remote): download and extract zip file if the remote is a GitHub URL #357
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #357 +/- ##
==========================================
+ Coverage 90.12% 90.61% +0.49%
==========================================
Files 49 50 +1
Lines 2724 2813 +89
Branches 562 596 +34
==========================================
+ Hits 2455 2549 +94
+ Misses 269 264 -5 ☔ View full report in Codecov by Sentry. |
* Check if URL is a GitHub repository URL or shorthand | ||
*/ | ||
export const isGitHubUrlOrShorthand = (value: string): boolean => { | ||
return value.includes('github.com') || isGitHubShorthand(value); |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
github.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 6 days ago
To fix the problem, we need to parse the URL and check the host component specifically to ensure it matches 'github.com' or its subdomains. This can be achieved by using the URL
class available in modern JavaScript environments.
- Parse the URL using the
URL
class. - Extract the host component and check if it matches 'github.com' or its subdomains.
- Update the
isGitHubUrlOrShorthand
function to use this new check.
-
Copy modified lines R27-R32
@@ -26,3 +26,8 @@ | ||
export const isGitHubUrlOrShorthand = (value: string): boolean => { | ||
return value.includes('github.com') || isGitHubShorthand(value); | ||
try { | ||
const url = new URL(value); | ||
return url.hostname === 'github.com' || url.hostname.endsWith('.github.com') || isGitHubShorthand(value); | ||
} catch (e) { | ||
return isGitHubShorthand(value); | ||
} | ||
}; |
📝 WalkthroughWalkthroughThe pull request adds functionality to download GitHub repositories as ZIP files. A new dependency, Sequence Diagram(s)sequenceDiagram
participant User as CLI User
participant RA as RemoteAction
participant GHU as GitHubDownloadUtil
participant GC as GitClone Utility
User->>RA: Invoke runRemoteAction(repoUrl)
alt GitHub URL or Shorthand
RA->>RA: Check with isGitHubUrlOrShorthand(repoUrl)
RA->>GHU: parseGitHubUrl(repoUrl)
GHU-->>RA: Return owner, repo, [branch]
RA->>GHU: downloadGitHubZip(repoUrl, directory, branch)
alt ZIP Download Succeeds
GHU-->>RA: Extraction Completed
RA-->>User: Return Success (via ZIP)
else ZIP Download Fails
RA->>GC: execGitShallowClone(repoUrl)
GC-->>RA: Clone Outcome (Success/Failure)
RA-->>User: Return Fallback Outcome
end
else
RA->>GC: execGitShallowClone(repoUrl)
GC-->>RA: Clone Outcome
RA-->>User: Return Clone Success
end
Possibly related PRs
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/core/file/githubZipDownload.ts (1)
74-122
: Be mindful of handling large repositories or memory constraints.The current approach downloads the entire ZIP into memory before extracting. For especially large repositories, this may be insufficient or slow. A streaming-based extraction approach (e.g., leveraging shrinkwrap streams or a pipeline) could make the memory usage more efficient.
tests/cli/actions/remoteAction.test.ts (1)
22-33
: Test the error scenario leading to the catch block.The changes cover the happy paths and fallback logic, but consider adding a test that forces an error after cloning or downloading begins (e.g., disk space issues, invalid perms) to confirm the spinner and error paths behave as expected in the catch block.
tests/core/file/githubZipDownload.test.ts (1)
19-59
: Consider adding edge cases for shorthand validation.The test cases are comprehensive but could benefit from additional edge cases:
- Unicode characters in repository names
- Maximum length validation
- Numbers-only repository names
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (5)
package.json
(1 hunks)src/cli/actions/remoteAction.ts
(2 hunks)src/core/file/githubZipDownload.ts
(1 hunks)tests/cli/actions/remoteAction.test.ts
(3 hunks)tests/core/file/githubZipDownload.test.ts
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/cli/actions/remoteAction.ts
[warning] 73-73: src/cli/actions/remoteAction.ts#L73
Added line #L73 was not covered by tests
🪛 GitHub Check: CodeQL
src/core/file/githubZipDownload.ts
[failure] 27-27: Incomplete URL substring sanitization
'github.com' can be anywhere in the URL, and arbitrary hosts may come before or after it.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (5)
src/core/file/githubZipDownload.ts (2)
26-28
: Consider more robust URL validation.Currently, checking
value.includes('github.com')
can be bypassed by URLs that merely contain "github.com" as a substring (e.g., a malicious domain). This is the same concern raised previously regarding incomplete URL sanitization.🧰 Tools
🪛 GitHub Check: CodeQL
[failure] 27-27: Incomplete URL substring sanitization
'github.com' can be anywhere in the URL, and arbitrary hosts may come before or after it.
101-105
: Check for empty ZIP edge cases.Accessing
Object.keys(zip.files)[0]
assumes there's at least one file in the ZIP. If an empty ZIP or an unexpected structure is encountered, this might fail. Adding a fallback or early check would help prevent potential runtime errors.src/cli/actions/remoteAction.ts (1)
73-73
: Add test coverage for fallback errors.Line 73 handles the error scenario during repository download/clone, but it’s not covered by tests. Please add a test case that triggers an error at this stage to improve coverage and ensure proper error handling.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 73-73: src/cli/actions/remoteAction.ts#L73
Added line #L73 was not covered by teststests/core/file/githubZipDownload.test.ts (1)
1-17
: LGTM! Well-structured imports and mocking setup.The file has proper imports and mocking setup for fs operations and logger.
package.json (1)
79-79
:✅ Verification successful
Verify jszip version for latest security patches.
Let's check if version 3.10.1 is the latest stable version and if there are any known security vulnerabilities:
🌐 Web query:
What is the latest stable version of jszip? Are there any known security vulnerabilities in version 3.10.1?
💡 Result:
The latest stable version of JSZip is 3.10.1, released on August 2, 2022[1][4][7]. This version includes updates such as sponsorship file additions, metadata type consolidation, and dependency optimizations[4].
Regarding security vulnerabilities in version 3.10.1:
- No CVEs (Common Vulnerabilities and Exposures) have been reported for this version in the last ten years[5].
- Security scans from platforms like Snyk show no direct vulnerabilities for v3.10.1[10].
- A Telerik security analysis confirmed that while older versions had concerns (e.g.,
Math.random()
usage), these do not pose real threats in v3.10.1 due to non-cryptographic use cases[8].Notably:
- Earlier versions (pre-3.8.0) had a "zip slip" vulnerability addressed by filename sanitization in v3.8.0[9].
- Version 3.x introduced compatibility improvements and security patches for dependencies like
node-fetch
[3][8].For projects requiring strict security compliance, server-side alternatives like Telerik’s SpreadStreamProcessing are suggested to avoid client-side JSZip dependencies entirely[8].
Citations:
- 1: https://www.cdnpkg.com/jszip
- 2: https://www.cvedetails.com/product/98772/Jszip-Project-Jszip.html?vendor_id=25021
- 3: https://firebase.google.com/support/release-notes/js
- 4: https://nodejs.libhunt.com/jszip-latest-version
- 5: https://www.cvedetails.com/version/1145782/Jszip-Project-Jszip-3.10.1.html
- 6: https://cve.mitre.org/cgi-bin/cvekey.cgi
- 7: https://stuk.github.io/jszip/
- 8: https://www.telerik.com/forums/security-for-excel-exports
- 9: https://github.com/Stuk/jszip/blob/main/CHANGES.md
- 10: https://security.snyk.io/package/npm/jszip
Security Verification: JSZip Version is Up-to-Date
The dependency on JSZip in
package.json
(line 79) is correctly specified as"jszip": "^3.10.1"
, which is confirmed to be the latest stable version. No known security vulnerabilities have been reported for version 3.10.1. No changes are needed.
}); | ||
const mockJSZip = { | ||
loadAsync: vi.fn().mockRejectedValue(new Error('Invalid zip file')), | ||
files: {}, | ||
file: vi.fn(), | ||
folder: vi.fn(), | ||
forEach: vi.fn(), | ||
filter: vi.fn(), | ||
remove: vi.fn(), | ||
generateAsync: vi.fn(), | ||
generateNodeStream: vi.fn(), | ||
generateInternalStream: vi.fn(), | ||
loadFile: vi.fn(), | ||
support: {}, | ||
} as unknown as typeof JSZip; | ||
|
||
await expect( | ||
downloadGitHubZip('yamadashy/repomix', mockDirectory, undefined, { | ||
fetch: mockFetch, | ||
JSZip: mockJSZip, | ||
}), | ||
).rejects.toThrow(RepomixError); | ||
}); | ||
|
||
test('should handle file system errors', async () => { | ||
const mockFetch = vi.fn().mockResolvedValue({ | ||
ok: true, | ||
arrayBuffer: vi.fn().mockResolvedValue(new ArrayBuffer(8)), | ||
}); | ||
const mockJSZip = { | ||
loadAsync: vi.fn().mockResolvedValue({ | ||
files: mockZipContent, | ||
}), | ||
files: {}, | ||
file: vi.fn(), | ||
folder: vi.fn(), | ||
forEach: vi.fn(), | ||
filter: vi.fn(), | ||
remove: vi.fn(), | ||
generateAsync: vi.fn(), | ||
generateNodeStream: vi.fn(), | ||
generateInternalStream: vi.fn(), | ||
loadFile: vi.fn(), | ||
support: {}, | ||
} as unknown as typeof JSZip; | ||
|
||
const nonDirFiles = Object.values(mockZipContent).filter( | ||
(file): file is MockZipFile & { async: Mock } => !file.dir && file.async !== undefined, | ||
); | ||
for (const file of nonDirFiles) { | ||
file.async.mockResolvedValue(Buffer.from('test content')); | ||
} | ||
|
||
vi.mocked(fs.mkdir).mockRejectedValue(new Error('Permission denied')); | ||
|
||
await expect( | ||
downloadGitHubZip('yamadashy/repomix', mockDirectory, undefined, { | ||
fetch: mockFetch, | ||
JSZip: mockJSZip, | ||
}), | ||
).rejects.toThrow(RepomixError); | ||
}); | ||
|
||
test('should use specified branch for download', async () => { | ||
const mockFetch = vi.fn().mockResolvedValue({ | ||
ok: true, | ||
arrayBuffer: vi.fn().mockResolvedValue(new ArrayBuffer(8)), | ||
}); | ||
const mockJSZip = { | ||
loadAsync: vi.fn().mockResolvedValue({ | ||
files: mockZipContent, | ||
}), | ||
files: {}, | ||
file: vi.fn(), | ||
folder: vi.fn(), | ||
forEach: vi.fn(), | ||
filter: vi.fn(), | ||
remove: vi.fn(), | ||
generateAsync: vi.fn(), | ||
generateNodeStream: vi.fn(), | ||
generateInternalStream: vi.fn(), | ||
loadFile: vi.fn(), | ||
support: {}, | ||
} as unknown as typeof JSZip; | ||
|
||
const nonDirFiles = Object.values(mockZipContent).filter( | ||
(file): file is MockZipFile & { async: Mock } => !file.dir && file.async !== undefined, | ||
); | ||
for (const file of nonDirFiles) { | ||
file.async.mockResolvedValue(Buffer.from('test content')); | ||
} | ||
|
||
await downloadGitHubZip('yamadashy/repomix', mockDirectory, 'develop', { | ||
fetch: mockFetch, | ||
JSZip: mockJSZip, | ||
}); | ||
|
||
expect(mockFetch).toHaveBeenCalledWith('https://github.com/yamadashy/repomix/archive/refs/heads/develop.zip'); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Enhance error handling test coverage.
While the error handling tests are good, consider adding tests for:
- Partial downloads (interrupted connections)
- Rate limiting responses from GitHub
- Large ZIP files that could cause memory issues
Also, verify the current GitHub API rate limits to ensure the implementation handles them correctly:
🌐 Web query:
What are the current GitHub API rate limits for unauthenticated ZIP downloads?
💡 Result:
GitHub enforces strict rate limits on unauthenticated ZIP downloads through its API. For anonymous users or scripts not providing authentication credentials:
- Primary rate limit: 60 requests per hour per IP address[4][5][8][10]
- Applies to: Any ZIP/tar.gz file downloads from GitHub's repositories via direct URLs or API endpoints without authentication headers[2][5]
This limit includes both API calls and direct asset downloads through URLs like browser_download_url
[2][5]. When exceeded, users receive HTTP 403 Forbidden errors until the hourly window resets[2][5].
Authenticated requests using personal access tokens or OAuth apps have significantly higher limits (5,000-15,000 requests/hour)[1][4][6]. Some endpoints like repository file operations may have additional secondary limits on request frequency and concurrency[1][6].
Citations:
- 1: https://docs.github.com/en/graphql/overview/rate-limits-and-node-limits-for-the-graphql-api
- 2: CKAN's GitHub downloads are breaking the rules KSP-CKAN/CKAN#2210
- 3: https://docs.gitlab.com/ee/administration/settings/files_api_rate_limits.html
- 4: https://www.endorlabs.com/learn/how-to-get-the-most-out-of-github-api-rate-limits
- 5: Github API rate limit response preventing automatic upgrades syncthing/syncthing#9033
- 6: https://www.warpbuild.com/blog/rate-limits-self-hosted-runners
- 7: https://docs.github.com/en/apps/creating-github-apps/registering-a-github-app/rate-limits-for-github-apps
- 8: Cannot download binary due to api rate limit rust-lang/rust-analyzer#3094
- 9: https://support.atlassian.com/bitbucket-cloud/docs/api-request-limits/
- 10: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api
Enhance error handling tests to cover additional edge cases.
The current tests handle common error scenarios well, but please consider adding tests for the following:
- Partial downloads (interrupted connections): Simulate scenarios where the connection drops mid-transfer.
- Rate limiting responses: Add tests to simulate GitHub’s rate limit being hit. Note that unauthenticated ZIP downloads are currently capped at 60 requests per hour per IP and will result in HTTP 403 errors when exceeded.
- Large ZIP files: Test how the implementation handles the extraction of very large ZIP files, particularly with respect to memory constraints.
This additional coverage will ensure robust error handling for a wider range of failure modes.
This is an experimental PR.
Since GitHub allows downloading in zip format, it may be faster, so I will try it.
Also, this may make the remote function of repomix easier to use in the browser, even if only partially.
Checklist
npm run test
npm run lint