Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add test and fix python dependencies #304

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

leehuwuj
Copy link
Collaborator

@leehuwuj leehuwuj commented Sep 22, 2024

Summary by CodeRabbit

  • New Features

    • Introduced end-to-end tests for resolving Python dependencies across various vector databases and tools.
    • Added an optional parameter for specifying tools in the project setup function.
    • Enhanced GitHub Actions workflow with a new job for resolving Python dependencies.
    • Created separate script commands for running Python dependency tests.
    • Added new command-line options for specifying web and database sources.
  • Bug Fixes

    • Updated versions of several dependencies to ensure compatibility and improved functionality for various databases.
    • Broadened Python version specifications in project configuration files.

Copy link

changeset-bot bot commented Sep 22, 2024

⚠️ No Changeset found

Latest commit: 471aca2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

coderabbitai bot commented Sep 22, 2024

Walkthrough

The pull request introduces a new file containing end-to-end tests for resolving Python dependencies in projects that utilize various vector databases and tools. It modifies an existing utility function to accept an additional parameter for tools and updates dependency versions in another file. Additionally, a new job is added to the GitHub Actions workflow to automate the testing of Python dependencies across different Python versions.

Changes

Files Change Summary
e2e/resolve_python_dependencies.spec.ts Introduces end-to-end tests for resolving Python dependencies, verifying project setup, and checking the correctness of pyproject.toml and poetry.lock files for various vector databases and tools.
e2e/utils.ts Modifies runCreateLlama function to include an optional tools parameter, changing the default value for the --tools option to use this parameter.
helpers/python.ts Updates version numbers for various dependencies in the getAdditionalDependencies function across multiple cases, reflecting systematic version management.
.github/workflows/e2e.yml Adds a new job named resolve-python-dependencies to the GitHub Actions workflow for testing Python dependencies across multiple Python versions.
package.json Modifies the e2e script to exclude tests for Python dependencies and introduces a new script for running those specific tests.
templates/types/multiagent/fastapi/pyproject.toml Updates Python version specification to >=3.11,<3.13 and sets version constraints for llama-index-agent-openai.
templates/types/streaming/fastapi/pyproject.toml Broadens Python version constraint from ^3.11,<4.0 to >=3.11,<3.13.

Possibly related PRs

  • feat: implement interpreter tool #94: The main PR introduces end-to-end tests for resolving Python dependencies, which may relate to the interpreter tool added in this PR that could be used in the testing scenarios.
  • feat: support uploading pdf, docx, txt #140: The feature for uploading files in various formats could relate to the main PR's focus on validating Python dependencies, as the correct dependencies may be necessary for handling these file types.
  • feat: add e2e testing for llamacloud datasource #181: The addition of end-to-end testing for the LlamaCloud datasource aligns with the main PR's testing focus, as both aim to ensure proper functionality and integration of components.
  • chore: fixed [email protected] #296: The update to the llama-index and introduction of llama-index-core could be relevant as they may impact the dependencies being tested in the main PR, ensuring that the tests reflect the latest changes in the codebase.

Poem

In a world of code, a rabbit hops,
With tests for Python, the fun never stops!
Dependencies updated, tools now in line,
A dance of versions, oh how they shine!
With every change, our projects grow,
Hooray for the code, let the good times flow! 🐇✨


Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 680c0fd and 471aca2.

Files selected for processing (1)
  • .github/workflows/e2e.yml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/e2e.yml

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Outside diff range and nitpick comments (1)
e2e/resolve_python_dependencies.spec.ts (1)

27-27: Address the TODO: Add data sources to the test

The test currently lacks data sources, as indicated by the TODO comment. Incorporating data sources will enhance test coverage and ensure that dependencies related to data sources are properly tested.

Would you like assistance in adding data sources to the tests or should I open a new GitHub issue to track this task?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6e9184d and 364d221.

Files selected for processing (3)
  • e2e/resolve_python_dependencies.spec.ts (1 hunks)
  • e2e/utils.ts (2 hunks)
  • helpers/python.ts (5 hunks)
Additional comments not posted (9)
e2e/utils.ts (2)

35-35: LGTM!

The addition of the optional tools parameter to the runCreateLlama function is a good change. It provides flexibility to specify tools when needed, without breaking existing calls to the function. The type of the parameter is also appropriate and consistent with the intended usage.


69-69: LGTM!

The change to the --tools option in the command-line arguments is consistent with the addition of the optional tools parameter to the runCreateLlama function. It allows the option to be dynamically set based on the provided value, or defaults to "none" if not provided, using the nullish coalescing operator. This change maintains the expected behavior and is implemented correctly.

helpers/python.ts (7)

39-39: LGTM!

The minor version update of llama-index-vector-stores-mongodb from ^0.1.3 to ^0.3.1 should introduce new features and bug fixes while maintaining backward compatibility.


46-46: LGTM!

The minor version update of llama-index-vector-stores-postgres from ^0.1.1 to ^0.2.5 should introduce new features and bug fixes while maintaining backward compatibility.


60-60: LGTM!

The minor version update of llama-index-vector-stores-milvus from ^0.1.20 to ^0.2.0 should introduce new features and bug fixes while maintaining backward compatibility.


71-71: LGTM!

The minor version update of llama-index-vector-stores-astra-db from ^0.1.5 to ^0.2.0 should introduce new features and bug fixes while maintaining backward compatibility.


85-85: LGTM!

The minor version update of llama-index-vector-stores-chroma from ^0.1.8 to ^0.2.0 should introduce new features and bug fixes while maintaining backward compatibility.


92-92: LGTM!

The minor version update of llama-index-vector-stores-weaviate from ^1.0.2 to ^1.1.1 should introduce new features and bug fixes while maintaining backward compatibility.


133-133: LGTM!

The patch version update of llama-index-indices-managed-llama-cloud from ^0.3.0 to ^0.3.1 should introduce bug fixes and performance improvements while maintaining backward compatibility.

Comment on lines 88 to 98
if (vectorDb !== "none") {
if (vectorDb === "pg") {
expect(pyprojectContent).toContain(
"llama-index-vector-stores-postgres",
);
} else {
expect(pyprojectContent).toContain(
`llama-index-vector-stores-${vectorDb}`,
);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor to use a mapping for vectorDb dependencies

To enhance maintainability and scalability, consider using a mapping object to relate vectorDb values to their corresponding package names. This approach makes it easier to handle special cases and add new vector databases in the future.

Refactor the code as follows:

const vectorDbDependencies: { [key: string]: string } = {
  pg: "llama-index-vector-stores-postgres",
};

const expectedDependency =
  vectorDbDependencies[vectorDb] || `llama-index-vector-stores-${vectorDb}`;

expect(pyprojectContent).toContain(expectedDependency);

Remove unnecessary check for vectorDb !== "none"

The vectorDbs array does not contain "none", so the condition if (vectorDb !== "none") is unnecessary. Removing this condition simplifies the code and improves readability.

Apply this diff to remove the unnecessary condition:

 const pyprojectContent = fs.readFileSync(pyprojectPath, "utf-8");
-if (vectorDb !== "none") {
   if (vectorDb === "pg") {
     expect(pyprojectContent).toContain(
       "llama-index-vector-stores-postgres",
     );
   } else {
     expect(pyprojectContent).toContain(
       `llama-index-vector-stores-${vectorDb}`,
     );
   }
-}
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (vectorDb !== "none") {
if (vectorDb === "pg") {
expect(pyprojectContent).toContain(
"llama-index-vector-stores-postgres",
);
} else {
expect(pyprojectContent).toContain(
`llama-index-vector-stores-${vectorDb}`,
);
}
}
if (vectorDb === "pg") {
expect(pyprojectContent).toContain(
"llama-index-vector-stores-postgres",
);
} else {
expect(pyprojectContent).toContain(
`llama-index-vector-stores-${vectorDb}`,
);
}

Comment on lines 99 to 106
if (tool !== "none") {
if (tool === "wikipedia.WikipediaToolSpec") {
expect(pyprojectContent).toContain("wikipedia");
}
if (tool === "google.GoogleSearchToolSpec") {
expect(pyprojectContent).toContain("google");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor to use a mapping for tool dependencies

Using a mapping object for tools and their required dependencies improves code clarity and makes it easier to manage additional tools in the future.

Refactor the code as follows:

const toolDependencies: { [key: string]: string } = {
  "wikipedia.WikipediaToolSpec": "wikipedia",
  "google.GoogleSearchToolSpec": "google",
};

const expectedDependency = toolDependencies[tool];

if (expectedDependency) {
  expect(pyprojectContent).toContain(expectedDependency);
}

Remove unnecessary check for tool !== "none"

The toolOptions array does not contain "none", so the condition if (tool !== "none") is unnecessary. Eliminating this condition streamlines the code.

Apply this diff to remove the unnecessary condition:

 if (tool === "wikipedia.WikipediaToolSpec") {
   expect(pyprojectContent).toContain("wikipedia");
 }
 if (tool === "google.GoogleSearchToolSpec") {
   expect(pyprojectContent).toContain("google");
 }
-}

Committable suggestion was skipped due to low confidence.

Comment on lines 65 to 78
const { stdout, stderr } = await execAsync(
// Config poetry to create virtualenv in project directory.
// so that we can easily prune the e2e cache to avoid overloading the storage.
"poetry config virtualenvs.in-project true && poetry lock --no-update",
{
cwd: path.join(cwd, name),
},
);
console.log("poetry lock stdout:", stdout);
console.error("poetry lock stderr:", stderr);
} catch (error) {
console.error("Error running poetry lock:", error);
throw error;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle errors from execAsync more explicitly

Currently, the catch block logs the error but rethrows it without additional context. Providing more detailed error handling can aid in debugging when poetry lock fails.

Consider enhancing the error handling as follows:

 } catch (error) {
   console.error("Error running poetry lock:", error);
+  throw new Error(`Poetry lock failed for ${optionDescription}: ${error.message}`);
 }

This change adds context to the error message, indicating which combination of vectorDb and tool caused the failure.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { stdout, stderr } = await execAsync(
// Config poetry to create virtualenv in project directory.
// so that we can easily prune the e2e cache to avoid overloading the storage.
"poetry config virtualenvs.in-project true && poetry lock --no-update",
{
cwd: path.join(cwd, name),
},
);
console.log("poetry lock stdout:", stdout);
console.error("poetry lock stderr:", stderr);
} catch (error) {
console.error("Error running poetry lock:", error);
throw error;
}
const { stdout, stderr } = await execAsync(
// Config poetry to create virtualenv in project directory.
// so that we can easily prune the e2e cache to avoid overloading the storage.
"poetry config virtualenvs.in-project true && poetry lock --no-update",
{
cwd: path.join(cwd, name),
},
);
console.log("poetry lock stdout:", stdout);
console.error("poetry lock stderr:", stderr);
} catch (error) {
console.error("Error running poetry lock:", error);
throw new Error(`Poetry lock failed for ${optionDescription}: ${error.message}`);
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (1)
.github/workflows/e2e.yml (1)

106-110: Consider using the same Node.js versions as the e2e job for consistency.

The Node.js version is hardcoded to 18, which may not be consistent with the versions used in the e2e job. Consider using a matrix strategy or environment variable to specify the Node.js version for consistency across the workflow.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 364d221 and 3f2b434.

Files selected for processing (1)
  • .github/workflows/e2e.yml (1 hunks)
Additional comments not posted (9)
.github/workflows/e2e.yml (9)

92-92: LGTM!

Using actions/checkout@v4 is a standard practice to check out the code in GitHub Actions workflows.


94-97: LGTM!

Using actions/setup-python@v5 is a standard practice to set up Python in GitHub Actions workflows. The Python version is correctly specified from the matrix.


99-102: LGTM!

Using snok/install-poetry@v1 is a good practice to install Poetry in GitHub Actions workflows. The Poetry version is correctly specified from the environment variable.


112-113: LGTM!

Using pnpm install is a standard practice to install dependencies in GitHub Actions workflows.


115-117: LGTM!

Using pnpm exec playwright install --with-deps is a standard practice to install Playwright browsers in GitHub Actions workflows.


119-121: LGTM!

Using pnpm run build is a standard practice to build the project in GitHub Actions workflows.


123-125: LGTM!

Using pnpm run pack-install is a project-specific command to install the project in GitHub Actions workflows.


127-132: LGTM!

Using pnpm exec playwright test is a standard practice to run Playwright tests in GitHub Actions workflows. The test file e2e/resolve_python_dependencies.spec.ts is specific to testing Python dependencies, and the necessary environment variables for API keys are correctly specified.


134-139: LGTM!

Using actions/upload-artifact@v3 is a standard practice to upload artifacts in GitHub Actions workflows. The Playwright report is correctly specified as the artifact to upload, and the retention period of 30 days is a reasonable default value.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (1)
templates/types/streaming/fastapi/pyproject.toml (1)

12-12: Consider the impact on future Python compatibility.

The change in the Python version constraint from ^3.11,<4.0 to >=3.11,<3.13 aligns the project with the current stable Python versions (3.11 and 3.12). This is a good practice to ensure compatibility with the latest Python releases.

However, please consider the project's future Python compatibility goals. If the project intends to support future Python versions (3.13 and above), it might be beneficial to keep a wider version range. This can be achieved by using a constraint like >=3.11,<4.0, which allows for Python 3.11 and above, but not 4.0 and above.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3f2b434 and 89c1114.

Files selected for processing (4)
  • helpers/python.ts (3 hunks)
  • package.json (1 hunks)
  • templates/types/multiagent/fastapi/pyproject.toml (1 hunks)
  • templates/types/streaming/fastapi/pyproject.toml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • helpers/python.ts
Additional comments not posted (3)
templates/types/multiagent/fastapi/pyproject.toml (1)

13-13: LGTM! Verify compatibility with the updated Python and dependency versions.

The changes to the Python version and dependency specifications look good. They allow for more flexibility in the runtime environment and ensure compatibility with the project's requirements.

Please ensure that the project has been thoroughly tested with the updated Python versions (3.11 and 3.12) and the specified version range of the llama-index-agent-openai dependency (0.3.0 to 0.3.x) to avoid any potential compatibility issues.

package.json (2)

27-27: LGTM!

The change to the e2e script command is valid and aligns with the PR objective of fixing Python dependencies. The use of the --grep-invert option effectively excludes tests matching the pattern 'resolve_python_dependencies' from the e2e test run, allowing for better isolation and targeting of specific test scenarios.


28-28: LGTM!

The introduction of the e2e:python-deps script command is a good addition and aligns with the PR objective of fixing Python dependencies. The use of the --grep option allows for targeted execution of tests that match the 'resolve_python_dependencies' pattern, enabling better focus and isolation of Python dependency-related tests.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 89c1114 and 680c0fd.

Files selected for processing (3)
  • e2e/resolve_python_dependencies.spec.ts (1 hunks)
  • e2e/utils.ts (3 hunks)
  • index.ts (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • e2e/resolve_python_dependencies.spec.ts
Additional comments not posted (6)
e2e/utils.ts (4)

45-45: LGTM!

The change is a part of a larger refactor to handle different data source types more effectively. The change itself does not introduce any apparent issues.


50-60: LGTM!

The new logic aligns with the PR objective to handle different data source types more effectively. The code segment correctly handles the three possible cases and does not introduce any apparent issues.


69-69: LGTM!

The change aligns with the refactor to handle different data source types more effectively and allows for more flexible input handling. The change should improve the handling of different data source types.


82-82: LGTM!

The change aligns with the addition of the new optional tools parameter to the runCreateLlama function and improves the flexibility by allowing the caller to specify the tools to be used. The default value of "none" ensures backward compatibility.

index.ts (2)

93-106: LGTM!

The addition of the --web-source and --db-source command-line options is a great enhancement to support different types of data sources. The descriptions provided for these options are clear and informative.


232-252: Looks good!

The code segment correctly handles the logic for the newly added --web-source and --db-source options. It populates the program.dataSources array with the appropriate data source configuration based on the detected option.

The configuration objects include relevant details such as the data source type, connection details, and default query (for the database option), ensuring that the selected data source is properly configured and ready to be used by the program.

@@ -32,6 +32,7 @@ export async function runCreateLlama(
appType?: AppType,
llamaCloudProjectName?: string,
llamaCloudIndexName?: string,
tools?: string,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

Function runCreateLlama is still called without the tools argument.

Please update all calls to runCreateLlama to include the tools parameter to maintain consistency and prevent potential issues.

  • e2e/resolve_python_dependencies.spec.ts
  • e2e/streaming_template.spec.ts
  • e2e/multiagent_template.spec.ts
  • e2e/extractor_template.spec.ts
Analysis chain

LGTM! Verify the function usage across the codebase.

The addition of the optional tools parameter aligns with the PR objective and maintains backward compatibility. Please ensure that all existing calls to this function have been updated to provide the tools argument where necessary.

Run the following script to verify the function usage:

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `runCreateLlama` provide the `tools` argument where necessary.

# Test: Search for the function usage. Expect: No occurrences of the old signature.
rg --type typescript -A 5 $'runCreateLlama\('

Length of output: 81


Script:

#!/bin/bash
# Description: Verify all function calls to `runCreateLlama` provide the `tools` argument where necessary.

# Test: Search for the function usage. Expect: No occurrences of the old signature.
rg --type ts -A 5 'runCreateLlama\('

Length of output: 1664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant