Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to retry on 502? #138

Closed
yarikoptic opened this issue Nov 28, 2021 · 8 comments · Fixed by #139
Closed

Need to retry on 502? #138

yarikoptic opened this issue Nov 28, 2021 · 8 comments · Fixed by #139
Assignees

Comments

@yarikoptic
Copy link
Member

might be reincarnation of #129

In the past days there were a good number of failed runs, e.g.

   1     Nov 28 Cron Daemon      Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-datalad-extensions.lock /mnt/datasets/datalad/ci/datalad-extensions/tools/cron_job
   2     Nov 27 Cron Daemon      Cron <dandi@drogon> cd ~/cronlib/tinuous-logs/dandischema && chronic flock -n -E 0 /home/dandi/.run/tinuous-dandischema.lock ~/cronlib/tinuous-logs/venv/bin/tinuous fetch
   3     Nov 27 Cron Daemon      Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-datalad.lock /mnt/datasets/datalad/ci/logs/tools/cron_job
   4     Nov 27 Cron Daemon      Cron <dandi@drogon> cd ~/cronlib/tinuous-logs/dandi-api && chronic flock -n -E 0 /home/dandi/.run/tinuous-dandi-api.lock ~/cronlib/tinuous-logs/venv/bin/tinuous fetch
   5     Nov 27 Cron Daemon      Cron <dandi@drogon> cd ~/cronlib/tinuous-logs/dandi-cli && chronic flock -n -E 0 /home/dandi/.run/tinuous-dandi-cli.lock ~/cronlib/tinuous-logs/venv/bin/tinuous fetch
   6     Nov 27 Cron Daemon      Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-datalad.lock /mnt/datasets/datalad/ci/logs/tools/cron_job
   7     Nov 27 Cron Daemon      Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-git-annex.lock /mnt/datasets/datalad/ci/git-annex/tools/cron_job
   8     Nov 27 Cron Daemon      Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-datalad.lock /mnt/datasets/datalad/ci/logs/tools/cron_job

all seemed looked similar if not identical to

2021-11-27T16:20:02-0500 [INFO    ] tinuous tinuous 0.5.0+6.gba81776
2021-11-27T16:20:02-0500 [INFO    ] tinuous Fetching resources from github
2021-11-27T16:20:02-0500 [INFO    ] tinuous Fetching runs newer than 2021-11-26 18:41:44+00:00
Traceback (most recent call last):
  File "/home/datalad/miniconda3/envs/tinuous-dev/bin/tinuous", line 33, in <module>
    sys.exit(load_entry_point('tinuous', 'console_scripts', 'tinuous')())
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/__main__.py", line 112, in fetch
    for obj in ci.get_build_assets(
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 108, in get_build_assets
    for wf in self.get_workflows():
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 42, in wrapped
    return func(gha, *args, **kwargs)
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 85, in get_workflows
    for wf in self.ghrepo.get_workflows():
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/functools.py", line 969, in __get__
    val = self.func(instance)
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 42, in wrapped
    return func(gha, *args, **kwargs)
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 80, in ghrepo
    return self.client.get_repo(self.repo)
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/MainClass.py", line 330, in get_repo
    headers, data = self.__requester.requestJsonAndCheck("GET", url)
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 353, in requestJsonAndCheck
    return self.__check(
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 502 {"message": "Server Error"}

should we need more/longer sleeps/retries?

@jwodder
Copy link
Member

jwodder commented Nov 29, 2021

@yarikoptic

  • Unlike the 502's seen in Sleep/retry on 502 Bad Gateway #129, this 502 occurred in response to a request made via PyGithub. The only failed PyGithub requests we currently retry are "rate limit exceeded" errors, not 5xx's.
  • GitHub was experiencing numerous service disruptions on Saturday. I doubt retrying failed requests at the time would have made much of a difference.

@yarikoptic
Copy link
Member Author

  • GitHub was experiencing numerous service disruptions on Saturday. I doubt retrying failed requests at the time would have made much of a difference.

given that not all runs of con/tinuous experienced this issue, I would tend to disagree -- sleep/retry could as well have helped to mitigate it without causing completely failed runs.

@jwodder
Copy link
Member

jwodder commented Nov 29, 2021

@yarikoptic So exactly what changes do you want?

@yarikoptic
Copy link
Member Author

retry (whatever request it needs to retry) with sleeps on receiving 502 in that code path

@jwodder
Copy link
Member

jwodder commented Nov 29, 2021

@yarikoptic The specific request that failed was this line. Is that the only place where you want 502 retries added?

@yarikoptic
Copy link
Member Author

in the traceback above it is

  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/__main__.py", line 112, in fetch
    for obj in ci.get_build_assets(

seems to be a different line.

@jwodder
Copy link
Member

jwodder commented Nov 29, 2021

@yarikoptic If you follow the traceback, the actual call to PyGithub is the line that I pointed out.

@yarikoptic
Copy link
Member Author

doh right -- somehow I took that github.py as an outside package. I guess that is the only location for now.

yarikoptic added a commit that referenced this issue Nov 29, 2021
Retry Github.get_repo() requests that fail with 502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants