Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto Retry Toggle #2770

Open
JuchangGit opened this issue Aug 10, 2024 · 3 comments
Open

Auto Retry Toggle #2770

JuchangGit opened this issue Aug 10, 2024 · 3 comments
Labels
enhancement needs more info An issue that may be a bug or useful feature, but requires more information

Comments

@JuchangGit
Copy link

能否添加自动重试的开关或者控制方式,因为bento目前的工作方式是输入和输出错误发生错误时会一直重试下去,这对于有些场景是不适合的。
比如:输入为数据库时,上游如果改变了表结构,那么bento将不断重试,上游可能认为这是恶意攻击

能否添加控制重试的次数或是否自动重试的开关来控制bento的默认行为

@mihaitodor
Copy link
Collaborator

Via Google Translate:

Can you add an automatic retry switch or control method? Because the current working method of Bento is that it will keep retrying when input and output errors occur, which is not suitable for some scenarios.
For example: when the input is a database, if the upstream changes the table structure, then Bento will keep retrying, and the upstream may think this is a malicious attack

Can you add a switch to control the number of retries or whether to retry automatically to control the default behavior of bento?

Hey @JuchangGit 👋 Thank you for reaching out!

it will keep retrying when input and output errors occur

Not sure what you mean by input error. Connect will keep trying to connect to an input until it succeeds or until the process is terminated. That is by design. It's up to the users to leverage either the /ready HTTP endpoint or metrics (or, worst case, logs) and take the appropriate action when this situation occurs.

For outputs, if Connect is able to establish a connection to the output, then it may get an error back. In such cases, you have various meta outputs such as drop_on or fallback or reject_errored or retry to control what should happen to the current message in such cases. Additionally, for example with fallback, you could have something like this:

output:
  switch:
    cases:
      - check: metadata("status") == "OK"
         output:
           fallback:
             - your_actual_output: ...
             - drop: {} # Feel free to replace this with a dead letter queue output (i.e. `kafka_franz`)
                processors:
                   - cache: Set a key called `status` in an in memory cache indicating that the above output is busted
      - output: # This is the catch-all output which is used when `metadata("status") != "OK"`
           drop: {} # Feel free to replace this with a dead letter queue output (i.e. `kafka_franz`)
  
  processors:
    - cache: # Fetch the `status` key from an in memory cache and set it in a metadata field called `status`
                   # You can use a TTL when setting the key so it expires after a while and allows the `your_actual_output` to be attempted again after this period lapses.

Can you add a switch to control the number of retries or whether to retry automatically to control the default behavior of bento?

You have full flexibility as described above. Another approach is to use the retry output I mentioned and configure exponential backoff: https://docs.redpanda.com/redpanda-connect/components/outputs/retry/#backoff and max_retries. You can have this as a child within a fallback output so you can redirect the messages to a dead letter queue (or just drop them) if max_retries lapse.

@mihaitodor mihaitodor added enhancement needs more info An issue that may be a bug or useful feature, but requires more information labels Aug 21, 2024
@JuchangGit
Copy link
Author

JuchangGit commented Aug 22, 2024

是否可以为input和output提供一个配置项——最大重试次数 max_retry_num ,默认值为 -1 表示一直重试(和现在的机制一样), 让用户可以控制重试的次数。配置像下面这样:

input:
  max_retry_num: 2
  stdin:
    scanner:
      lines: {}
    auto_replay_nacks: true
buffer:
  none: {}
pipeline:
  threads: -1
  processors: []
output:
  max_retry_num: 3
  stdout:
    codec: lines

@mihaitodor
Copy link
Collaborator

Unfortunately, no, that's not currently possible like I mentioned above:

Connect will keep trying to connect to an input until it succeeds or until the process is terminated. That is by design. It's up to the users to leverage either the /ready HTTP endpoint or metrics (or, worst case, logs) and take the appropriate action when this situation occurs.

You can, however, use Streams Mode to have a separate watchdog stream which uses the generate input combined with the http processor to query the /ready HTTP endpoint and take the appropriate action when this indicates that the input isn't connected. You can even have this http processor retry several times to be sure that the connectivity issue isn't transient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs more info An issue that may be a bug or useful feature, but requires more information
Projects
None yet
Development

No branches or pull requests

2 participants