Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Importing GraphML File in AWS Neptune Engine 1.2.1.0 #335

Open
javatask opened this issue Nov 16, 2023 · 2 comments
Open

Issues with Importing GraphML File in AWS Neptune Engine 1.2.1.0 #335

javatask opened this issue Nov 16, 2023 · 2 comments

Comments

@javatask
Copy link

Hello!

I am encountering issues with importing a GraphML file into AWS Neptune using both the provided tooling and direct Gremlin queries. Below are the steps I followed and the errors I encountered:

Environment:

  • AWS Neptune Engine Version: 1.2.1.0

Issue 1: Tooling Error

  • Procedure:
    1. Cloned the esig/dss project.
    2. Navigated to dss-cookbook.
    3. Ran mvn dependency:tree -DoutputType=graphml -DoutputFile=dep.xml to generate a GraphML file (dep.xml).
    4. Executed the command ./graphml2csv.py -i dep.xml.
  • Error Encountered:
    • KeyError('d0') in graphml2csv.py.
  • Result:
    • Blocked from using the loader endpoint.

Issue 2: Direct Query Timeout Error

  • Procedure:
    1. Uploaded dep.xml file to Amazon S3.
    2. Generated a presigned URL using AWS CLI.
    3. Attempted to run a Gremlin I/O query directly using the command:
      curl -X POST -d '{"gremlin":" g.io(\"https://bucket.s3.eu-central-1.amazonaws.com/dep.xml?X-Amz-Algorithm=AWS4-HMAC\").read().iterate()"}' http://endpoint:8182/gremlin
  • Error Encountered:
    • Received a 504 Gateway Time-out response, indicating that the server did not respond in time.
    • Response Content:
      <html><body><h1>504 Gateway Time-out</h1>
      The server didn't respond in time.
      </body></html>
      

I would appreciate your assistance in resolving these issues. Please let me know if you need any further information or clarification.

Best regards,
Andrii

@triggan
Copy link
Contributor

triggan commented Nov 16, 2023

I'm unsure on Issue 1 - I would have to recreate that GraphML file and debug the graphml2csv tool to determine why that is happening. For issue 2, it is likely because this requires an S3 VPC Endpoint on the VPC where your Neptune cluster is hosted. Neptune does not have a public IP address attached to it, so it has no way to route requests directly through an IGW to the Internet (and S3 is a public IP-space hosted service). In order for Neptune to have access to S3, it requires an S3 VPC Endpoint (recommended) or a NAT Gateway.

Also ensure that the pre-signed URLs you are creating for S3 are correctly generated using the s3 presigned API. You may have included an abbreviated URL for example purposes, but the example you provide is missing additional parameters that you would normally see on a presigned URL (i.e. signature).

@javatask
Copy link
Author

Thank you @triggan , about the second use case, it was my bad not configuring the VPC Endpoint. It would be great if g.io would support "s3://" natively.

For the first use case please use the attached file dep.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants