Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Patch] BUG: Metadata API Request Error Handling / https://github.com/alephdata/aleph/issues/3812 / #3813

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

dimitardanailov
Copy link

I'd like to share my proposal about the following issue: #3812

The patch uses two different protection techniques.

#1
state. metadataRequestAttempts tracks how many attempts are requested to axios REST API layer. The current restrictions to const MAX_METADATA_REQUEST_ATTEMPTS = 5;

#2
executeRetryPattern uses setTimeout. setTimeout is going to be increased each time if the request has any error.

Accomplishments

#1
The previous version executed many request without any delay between requests.
Screenshot 2024-07-16 at 10 07 03 AM

#2
The gap between UI pop dialog is increased from a few ms to 2000+ ms.

Screenshot 2024-07-16 at 10 11 57 AM

@dimitardanailov dimitardanailov marked this pull request as draft July 25, 2024 08:20
@dimitardanailov
Copy link
Author

dimitardanailov commented Jul 25, 2024

Why the current pull request has a draft status ?

The main reason for that is: I've been working on different task. I've a plan to restructure the source code to add a state machine and code splitting between Happy and Unhappy path.

cc: @richardjennings-occrp

PageState is a state machine. The Router has three options: Loading,
SomethingIsWrong, Success.
@dimitardanailov dimitardanailov marked this pull request as ready for review July 31, 2024 04:26
@dimitardanailov
Copy link
Author

The current pull request migrates router structure from single "flat" file to folder structure with sub components and utils. The router component supports three different stats

  • Loading
  • SomethingIsWrong
  • Success

cc: @tillprochaska and @richardjennings-occrp

Copy link
Contributor

@tillprochaska tillprochaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks a lot for fixing this long-standing bug! I will try to take a closer look soon, but at first glance I noticed two small things (see comments).

Comment on lines -6 to +11
const response = await endpoint.get('metadata');
return { metadata: response.data };
try {
const response = await endpoint.get('metadata');
return { metadata: response.data };
} catch (e) {
throw e;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the reason for catching and then immediately re-throwing exceptions here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tillprochaska

a)
The current structure (try and catch) doesn't have a listener in the Router component. I've tried various approaches to address this in the Router, but the request has several layers of execution. The Router component currently lacks the capability to add a listener if something goes wrong*

* - My assumption might be incorrect.

b)
On the other hand, the current structure is a good starting point for discussion. I use the design pattern of eventual consistency. What do I mean by this?

Each interaction with a third-party/external source (filesystem, backend, external APIs, or providers) should be wrapped in a try and catch block. The try and catch listener should be accessible from the UI layer (jsx/tsx components).

c)
From my perspective, the following implementation:

const response = await endpoint.get('metadata');
return { metadata: response.data };

is oriented only towards successful (200) requests. From a practical standpoint, this is insufficient, as eventually, the source will encounter requests with different response codes: 4xx or 500*.

* - I don't want the current changes to become a blocker or escalate into a larger issue. In my opinion, the implementation should be wrapped in a try and catch block, but this is just my assumption.


Please review my suggestions. If there are any concerns or alternative approaches you recommend, I am open to discussing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitardanailov I have come back to this question and your answer a few times. I have not been able to understand your answer.

What you have written:

try {
      const response = await endpoint.get('metadata');
      return { metadata: response.data };
    } catch (e) {
      throw e;
    }

is logically the same as:

const response = await endpoint.get('metadata');
return { metadata: response.data };

Why did you chose to add the try catch if it is not doing anything ?

ui/src/app/Router/Components/SomethingIsWrong.jsx Outdated Show resolved Hide resolved
Copy link
Contributor

@tillprochaska tillprochaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that it took me so long to reply. Thanks for making changes! While testing, I noticed that this breaks unrelated functionality, but that should be easy to avoid (see comment below).

}

setMessagesInterval() {
const id = setInterval(() => this.executeRetryPattern(), MESSAGES_INTERVAL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setMessagesInterval sets up a periodic function that re-fetches the message banner data. The message banner data is different from the metadata. For example, at the moment, the following message banner is displayed on aleph.occrp.org:

image

Changing this line to also use executeRetryPattern means that the message banenr data isn’t refreshed periodically anymore.

I assume that the intent of this change was to apply the same error handling that is applied for metadata to the message banner data, too. However, it’s not necessary in this case: An error that occurs while fetching message banner data is handled gracefully (the message banner simply isn’t displayed, as it is not critical for Aleph to function).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tillprochaska

Is it possible to add any specific time interval about the following word / term: periodically ?

delay,
this.state.metadataRequestAttempts
);
this.metadataTimeoutIntervals.push(intervalId);
Copy link
Contributor

@tillprochaska tillprochaska Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t super important, but I was curious to understand if it is necessary to keep track of the timeout IDs? My understanding is that the timeouts are cleared as part of the callback, but at that point the timeout has already fired, so clearing the timeouts wouldn’t be necessary anymore. Removing this line doesn’t seem to affect the functionality of the changes in this PR. Are there any edge cases where this is relevant?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tillprochaska

  • The purpose of storing timeout IDs is to allow for the clearing of all pending retries if any request in the list of attempts returns a successful status (HTTP 200).
  • This ensures that redundant retries do not continue once the desired outcome (successful metadata load) is achieved.

Additional Context:

  • The implementation is designed to work within a Redux-based application where actions (fetchMetadata, fetchMessages, dismissMessage) are dispatched to handle asynchronous data fetching and updating the application state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants