-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug/Execution speed is very slow in AWS LAMBDA environment #2916
Comments
@cds-code can you describe how you are running |
Im running a docker image in AWS Lambda
|
Have you accounted for spin-up (cold-start) time of the Lambda instance? Like only start timing after receiving the first response? Also, can you provide some specific timings? |
And how much memory is allocated to the Lambda instance? |
I have the same problem in AWS Batch running on fargate. I allocated 2 vCPUs and 4 GB of memory |
Does not contain Lambda instance cold-start time. just partition(filename="XXXXX.pdf") . When the program is executed three times in a loop. Only the first time was very long. ```
|
|
Also, just out of curiosity, can you give me a sense of the cold-start times you've seen? |
We are able to run Unstructured in a container on AWS Lambda without issue (or, well, there are issues, but we can work around them.) Things to consider (sorry that these points are a bit............unstructured):
If you're already accounting for the ECR download/caching time, one other thing you can try is to run a "fake" partition script during the build of your container image. This will help "warm up" any libraries/dependencies which may want to run some initial first-time setup tasks (like building/caching fonts, or downloading models). For example, in the same way you "warm up" the NLTK libraries, you could add a RUN step:
But, this will potentially exacerbate the first point about the container image size. |
This works for me thanks.
Does unstructured itself have an initial load method to Implement the above function? |
@adieuadieu sir could you please explain the how you are able to run unstructured package on lambda function, actually i am facing problem to do this please help me to solve this issue |
Hi @sanketsanjaypote29. I suspect this thread is not the correct avenue for that sort of request, nor am I available to offer general support. But, briefly, here's some high-level guidance which should start you in the right direction: You'll want to deploy your Lambda function using a container image. Add Alternatively, unless you're just noodling around with stuff for funsies, consider using Unstructured's hosted API service to save yourself the time of trying to run it on Lambda: https://docs.unstructured.io/api-reference/api-services/free-api |
@cds-code, @adieuadieu FROM public.ecr.aws/lambda/python:3.11 COPY requirements.txt . RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" RUN python3 -m spacy download en_core_web_sm --target "${LAMBDA_TASK_ROOT}" CMD [ "main.lambda_handler" ] Requirements.txt I am getting the following error while downloading punkt for pptx, pdf file-types using strategy="fast". Environment = AWS Lambda [nltk_data] Downloading package punkt to /home/sbx_user1051/nltk_data... |
It looks like NLTK isn't able to find the pre-downloaded models at runtime. You'll want to set the base of
Then you also need to make sure to set the
You cannot put it into |
While I think @adieuadieu is right, this may not be the best place for this - it's where I ended up when investigating running on lambda so I'm going to expand some details here on how to even run on lambda. Also some credit to @adieuadieu's other comment as well. For me, the main problems with running unstructured on lambda came down to 2 issues, 1) onnxruntime and 2) image size.
COPY patch.txt /sys/devices/system/cpu/possible
COPY patch.txt /sys/devices/system/cpu/present
Hope this helps. EDIT - I'm still having issues running in lambda environment, but I've moved that to it's own issue. |
Describe the bug
Execution speed is very slow in AWS Lambda environment with extract text from txt,pdf,docx etc, but very fast in local windows environment.
The text was updated successfully, but these errors were encountered: