Add exponential backoff #54

sylane · 2025-02-19T08:21:58Z

I don't really know how to add a test for this, any idea ?

matlaj · 2025-02-19T09:15:58Z

Maybe by connecting to some non-existent server?

sylane · 2025-02-19T09:32:50Z

Maybe by connecting to some non-existent server?

I already kind of run this code when testing the bad protocol version, what I don't know is how to validate that it is actually doing exponential backoff...

matlaj · 2025-02-19T09:35:05Z

Can you observe the state of the process, for example by calling sys:get_state/1?

sylane · 2025-02-19T10:51:52Z

Can you observe the state of the process, for example by calling sys:get_state/1?

I could, but I am wondering if this wouldn't be a bad idea. And to really test it properly I would have to try first without the server, kind of check it retries with a timing that kind of match expbackoff (it is random), then start the server, wait it connects, then shut down the server and check the delay reset... Seems like a lot of trouble...

src/grisp_connect_client.erl

ziopio · 2025-02-21T08:23:58Z

rebar.lock

@@ -1,28 +1,28 @@
 {"1.2.0",
 [{<<"certifi">>,{pkg,<<"certifi">>,<<"2.13.0">>},0},
 {<<"cowlib">>,{pkg,<<"cowlib">>,<<"2.13.0">>},2},
- {<<"grisp">>,{pkg,<<"grisp">>,<<"2.7.0">>},0},
+ {<<"grisp">>,{pkg,<<"grisp">>,<<"2.8.0">>},0},


The bump in grisp version should appear in the commit history

@ziopio I added it to the changeling, but do you want me to remove it from the PR altogether ?

It is fine just needs to appear in the commit history, once on main

Basically what I am asking is just that the commit that increases the version tells that is doing it. Ideally, this should be a commit by itself.

src/grisp_connect_client.erl

ziopio · 2025-02-21T11:46:54Z

@GwendalLaurent bsl removes the need of rounding the result of math:pow

sylane · 2025-02-21T12:07:16Z

@ziopio You are right, the RetryCount -1 is wrong, I didn't really validated it yet.

GwendalLaurent · 2025-02-24T08:43:02Z

src/grisp_connect_client.erl

+    %% Calculate the connection delay with exponential backoff.
+    %% The delay is selected randomly between `1000' and
+    %% `2 ^ RETRY_COUNT - 1000' with a maximum value of `64000'.
+    Delay = 1000 + rand:uniform(min(64000, (1 bsl RetryCount) * 1000) - 1000),


I would just add the link to the Amazon blog post and mention that we decided to go with the FullJitter algorithm. Like that anyone coming back to that code in a couple of weeks/months/years will know where the calculation comes from

Why first add 1000 and then subtract it again? With the subtraction one always has to double check that the calculation never can lead to a negative delay.

How about

Delay = 1000 * rand:uniform(min(64, (1 bsl RetryCount)))

It's less expensive for rand:uniform/1, since it's a smaller interval. About code readability: it's clearly a positive value. If you want to start at 1s instead of 2s (assuming RetryCount >= 1; I haven't checked the code), you can also do 1 bsl (RetryCount - 1).

Ir RetryCount is always > 0, (1 bsl RetryCount) * 1000 is always >= 2000` so - 1000 cannot be negative. The 1000 + random() is to have always a minimum delay.

I want the value to be between 1000 and (2^RetryCount * 1000)

Never mind the timer stuff. First one was of cause much slower.

So looking at this times: You need nearly half a second just to have a random number in steps of microseconds your way. That does not make sense at all.

YES it is bad. The whole point of exponential backoff, is to distribute the load of the client reconnecting to the server. The jitter done because without it the reconnection append in fixed time and overload the server every N seconds (see your own link for more details). Are we actually trying to optimise a call to random:uniform/1 happening at most every seconds on the grisp board while not connected at the cost of the goal of the exponential backoff itself ????

The new version of the formula looks good.

The whole point is to spread the delay over the whole time zone, if you do the random like this they will all be clustered at multiples of 1000, this wouldn't make sense.

Why? What is so bad of doing retries in steps of seconds?

BTW:

1> timer:tc(random, uniform, [64000]). {407,28390} 2> timer:tc(random, uniform, [64]). {3,47}

Not sure where you get that from. On the board:

Eshell V15.2.2 (press Ctrl+G to abort, type help(). for help) (grisp_demo@wasp)1> timer:tc(random, uniform, [64000]). {33,28390} (grisp_demo@wasp)2> timer:tc(random, uniform, [64000]). {31,46275} (grisp_demo@wasp)3> timer:tc(random, uniform, [64000]). {31,60533} (grisp_demo@wasp)4> timer:tc(random, uniform, [64000]). {32,32096}

As I said:

Never mind the timer stuff.

peerst

In the comments and anywhere else there are no units for time delays mentioned. Is it ms, µs or something else? I recommend always being clear about units everywhere (including possibly variable name suffix and function suffix). but certainly in all values that have a unit in comments. Otherwise I have to assume its count of potatoes ;-)

sylane requested review from maehjam and matlaj February 19, 2025 08:21

maehjam reviewed Feb 19, 2025

View reviewed changes

src/grisp_connect_client.erl Outdated Show resolved Hide resolved

src/grisp_connect_client.erl Outdated Show resolved Hide resolved

Base automatically changed from sylane/add-proto-version to main February 20, 2025 10:45

sylane force-pushed the sylane/add-exp-backoff branch from 4bca5c8 to ab2ab04 Compare February 20, 2025 11:03

ziopio self-requested a review February 21, 2025 08:23

ziopio requested changes Feb 21, 2025

View reviewed changes

GwendalLaurent requested changes Feb 21, 2025

View reviewed changes

src/grisp_connect_client.erl Outdated Show resolved Hide resolved

ziopio self-requested a review February 24, 2025 08:18

ziopio approved these changes Feb 24, 2025

View reviewed changes

GwendalLaurent requested changes Feb 24, 2025

View reviewed changes

peerst reviewed Feb 24, 2025

View reviewed changes

GwendalLaurent self-requested a review February 24, 2025 16:44

GwendalLaurent approved these changes Feb 24, 2025

View reviewed changes

sylane added 9 commits February 24, 2025 17:46

Add exponential backoff

75c9f86

Add exponential backoff test case

93505de

Fix typo

27e1539

Upgrade jarl dependency

1454c0f

Changed to use full jittered exponential backoff with minimum delay

ccb757f

Make the backoff test more reliable

49beb45

Make the exponential backoff formula clearer.

37eaac2

Add delay unit in comment

a6f99e2

Add link to blogpost

ee158fc

sylane force-pushed the sylane/add-exp-backoff branch from ba0f06f to ee158fc Compare February 24, 2025 16:49

Makes reconnect test more reliable

5c5cfad

sylane merged commit b4e28c8 into main Feb 24, 2025
1 check passed

sylane deleted the sylane/add-exp-backoff branch February 24, 2025 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add exponential backoff #54

Add exponential backoff #54

sylane commented Feb 19, 2025

matlaj commented Feb 19, 2025

sylane commented Feb 19, 2025

matlaj commented Feb 19, 2025

sylane commented Feb 19, 2025

ziopio Feb 21, 2025

sylane Feb 21, 2025

ziopio Feb 24, 2025

ziopio Feb 24, 2025 •

edited

Loading

ziopio commented Feb 21, 2025

sylane commented Feb 21, 2025

GwendalLaurent Feb 24, 2025

maehjam Feb 24, 2025

maehjam Feb 24, 2025

sylane Feb 24, 2025 •

edited

Loading

sylane Feb 24, 2025

maehjam Feb 24, 2025

sylane Feb 24, 2025

maehjam Feb 24, 2025

sylane Feb 24, 2025 •

edited

Loading

maehjam Feb 24, 2025

peerst left a comment •

edited

Loading

Add exponential backoff #54

Add exponential backoff #54

Conversation

sylane commented Feb 19, 2025

matlaj commented Feb 19, 2025

sylane commented Feb 19, 2025

matlaj commented Feb 19, 2025

sylane commented Feb 19, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ziopio Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

ziopio commented Feb 21, 2025

sylane commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylane Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylane Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peerst left a comment • edited Loading

Choose a reason for hiding this comment

ziopio Feb 24, 2025 •

edited

Loading

sylane Feb 24, 2025 •

edited

Loading

sylane Feb 24, 2025 •

edited

Loading

peerst left a comment •

edited

Loading