Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [firestore-bigquery-export] Task size too large errors occuring even with EXCLUDE_OLD_DATA set to yes/true #2111

Open
747project opened this issue Jun 3, 2024 · 3 comments
Labels
extension: firestore-bigquery-export Related to firestore-bigquery-export extension type: bug Something isn't working

Comments

@747project
Copy link

[READ] Step 1: Are you in the right place?

Issues filed here should be about bugs for a specific extension in this repository.
If you have a general question, need help debugging, or fall into some
other category use one of these other channels:

  • For general technical questions, post a question on StackOverflow
    with the firebase tag.
  • For general Firebase discussion, use the firebase-talk
    google group.
  • To file a bug against the Firebase Extensions platform, or for an issue affecting multiple extensions, please reach out to
    Firebase support directly.

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.50
  • Configuration values (redact info where appropriate):
    • BigQuery Dataset location: us
    • BigQuery Project ID: xxx
    • Database ID: (default)
    • Collection path: xxx
    • Enable Wildcard Column field with Parent Firestore Document IDs (Optional): false
    • Dataset ID: xxx
    • Table ID: xxx
    • BigQuery SQL table Time Partitioning option type (Optional): DAY
    • BigQuery Time Partitioning column name (Optional): timestamp
    • Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional): Parameter not set
    • BigQuery SQL Time Partitioning table schema field(column) type (Optional): omit
    • BigQuery SQL table clustering (Optional): document_id
    • Maximum number of synced documents per second (Optional): 100
    • Backup Collection Name (Optional): Parameter not set
    • Transform function URL (Optional): Parameter not set
    • Use new query syntax for snapshots: no
    • Exclude old data payloads (Optional): yes
    • Use Collection Group query (Optional): no
    • Cloud KMS key name (Optional): Parameter not set

[REQUIRED] Step 3: Describe the problem

Even when using the EXCLUDE_OLD_DATA setting to prevent old_data from being populated, we are still seeing Task size too large errors on many messages. This might mean that firestore payloads close to 1MB are being padded in a way that the subsequent Task exceeds 1MB

Steps to reproduce:

  1. Install extension version 0.1.50, ensure that Exclude old data payloads is set to yes
  2. Write a large document to firestore
Expected result

No Task size too large errors should appear at all.

Actual result

Observing many Task size too large errors in logs.

@747project 747project added the type: bug Something isn't working label Jun 3, 2024
@cabljac
Copy link
Contributor

cabljac commented Jul 2, 2024

It occurs to me that we probably shouldn't send the payload to a cloud task if the maximum payload size is 1MB, and a firestore document can technically be up to 1MB in size (even if that is rare)

Perhaps we should be just sending through the document references, and then fetching them in the task handler function. This would add reads to the extension, but would eliminate this issue entirely

@larstbone
Copy link

Using version 0.1.51 and having same problem. We know our documents are between 500KB and 900KB, so was hoping the EXCLUDE_OLD_DATA would be helpful. @747project , I'm curious if you know the size of your large document. We're hoping that if the doc is slightly less than 1MB that the EXCLUDE_OLD_DATA flag would work.

We're having trouble importing, because it seems the DO_BACKFILL flag is disabled, and the script fs-bq-import-collection is throwing Request Entity Too Large error. We're currently thinking the script does not honor the EXCLUDE_OLD_DATA flag.

@pr-Mais pr-Mais added the extension: firestore-bigquery-export Related to firestore-bigquery-export extension label Jul 18, 2024
@747project
Copy link
Author

747project commented Aug 23, 2024

@larstbone sorry for the delayed response. I unfortunately don't have a specific size of the documents that are causing the problem but I do know that we have multiple collections that contain many documents that are very close to, or just about 1MB in size. Either way, if there is a 1MB cap on cloud task and there is a 1MB cap on firestore, then the current task-based solution cannot inherently support all firestore doc scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension: firestore-bigquery-export Related to firestore-bigquery-export extension type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants