run started to error out #150

yarikoptic · 2022-10-02T14:44:31Z

happens consistently on datalad, seems always on the same pr

-N  - 5/549374: Cron Daemon            Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-datalad.lock /mnt/datasets/datalad/ci/logs/tools/cron_job 
...
2022-10-01T18:20:14-0400 [INFO    ] tinuous Found run 5550                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
2022-10-01T18:20:14-0400 [INFO    ] tinuous Logs for test_extensions.yml (Extensions) #5550 already downloaded to 2022/09/15/pr/7039/6ac3eef/github-Extensions-5550-failed; skipping                                                                                                                                                                                                                                                                                                                                                  
2022-10-01T18:20:14-0400 [INFO    ] tinuous Fetching runs for workflow  (Docs)                                                                                                                                                                                                                                                                                                                                                                                                                                                        
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/requests/adapters.py", line 439, in send                                                                                                                                                                                                                                                                                                                                                                                                                
    resp = conn.urlopen(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 846, in urlopen                                                                                                                                                                                                                                                                                                                                                                                                        
    return self.urlopen(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 846, in urlopen                                                                                                                                                                                                                                                                                                                                                                                                        
    return self.urlopen(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 846, in urlopen                                                                                                                                                                                                                                                                                                                                                                                                        
    return self.urlopen(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
  [Previous line repeated 9 more times]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/urllib3/connectionpool.py", line 836, in urlopen                                                                                                                                                                                                                                                                                                                                                                                                        
    retries = retries.increment(method, url, response=response, _pool=self)                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment                                                                                                                                                                                                                                                                                                                                                                                                          
    raise MaxRetryError(_pool, url, error or ResponseError(cause))                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/datalad/datalad/actions/workflows/208727/runs (Caused by ResponseError('too many 500 error responses'))                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
During handling of the above exception, another exception occurred:                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  File "/home/datalad/miniconda3/envs/tinuous-dev/bin/tinuous", line 33, in <module>                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    sys.exit(load_entry_point('tinuous', 'console_scripts', 'tinuous')())                                                                                                                                                                                                                                                                                                                                                                                                                                                             
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 829, in __call__                                                                                                                                                                                                                                                                                                                                                                                                                   
    return self.main(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 782, in main                                                                                                                                                                                                                                                                                                                                                                                                                       
    rv = self.invoke(ctx)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 1259, in invoke                                                                                                                                                                                                                                                                                                                                                                                                                    
    return _process_result(sub_ctx.command.invoke(sub_ctx))                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 1066, in invoke                                                                                                                                                                                                                                                                                                                                                                                                                    
    return ctx.invoke(self.callback, **ctx.params)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/core.py", line 610, in invoke                                                                                                                                                                                                                                                                                                                                                                                                                     
    return callback(*args, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func                                                                                                                                                                                                                                                                                                                                                                                                              
    return f(get_current_context().obj, *args, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/__main__.py", line 112, in fetch                                                                                                                                                                                                                                                                                                                                                                                                                                                 
    for obj in ci.get_build_assets(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 118, in get_build_assets                                                                                                                                                                                                                                                                                                                                                                                                                                        
    for run in self.get_runs(wf, self.since):                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 43, in wrapped                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    return func(gha, *args, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
  File "/mnt/datasets/datalad/ci/tinuous/src/tinuous/github.py", line 101, in get_runs                                                                                                                                                                                                                                                                                                                                                                                                                                                
    for r in wf.get_runs():                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/PaginatedList.py", line 56, in __iter__                                                                                                                                                                                                                                                                                                                                                                                                          
    newElements = self._grow()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/PaginatedList.py", line 67, in _grow                                                                                                                                                                                                                                                                                                                                                                                                             
    newElements = self._fetchNextPage()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/PaginatedList.py", line 199, in _fetchNextPage                                                                                                                                                                                                                                                                                                                                                                                                   
    headers, data = self.__requester.requestJsonAndCheck(                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 354, in requestJsonAndCheck                                                                                                                                                                                                                                                                                                                                                                                                  
    *self.requestJson(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 454, in requestJson                                                                                                                                                                                                                                                                                                                                                                                                          
    return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 528, in __requestEncode                                                                                                                                                                                                                                                                                                                                                                                                      
    status, responseHeaders, output = self.__requestRaw(                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 555, in __requestRaw                                                                                                                                                                                                                                                                                                                                                                                                         
    response = cnx.getresponse()                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/github/Requester.py", line 127, in getresponse                                                                                                                                                                                                                                                                                                                                                                                                          
    r = verb(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/requests/sessions.py", line 555, in get                                                                                                                                                                                                                                                                                                                                                                                                                 
    return self.request('GET', url, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/requests/sessions.py", line 542, in request                                                                                                                                                                                                                                                                                                                                                                                                             
    resp = self.send(prep, **send_kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/requests/sessions.py", line 655, in send                                                                                                                                                                                                                                                                                                                                                                                                                
    r = adapter.send(request, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  File "/home/datalad/miniconda3/envs/tinuous-dev/lib/python3.9/site-packages/requests/adapters.py", line 507, in send                                                                                                                                                                                                                                                                                                                                                                                                                
    raise RetryError(e, request=request)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
requests.exceptions.RetryError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/datalad/datalad/actions/workflows/208727/runs (Caused by ResponseError('too many 500 error responses'))

and started to fail on dandi-cli

Subject: Cron <dandi@drogon> cd /mnt/backup/dandi/tinuous-logs/dandi-cli && chronic flock -n -E 0 /home/dandi/.run/tinuous-dandi-cli.lock /mnt/backup/dandi/tinuous-logs/venv/bin/tinuous fetch                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2022-10-01T18:00:02-0400 [INFO    ] tinuous tinuous 0.5.2                                                                                                                                                                                                                                                                                                                                                                                                             
2022-10-01T18:00:02-0400 [INFO    ] tinuous Fetching resources from github                                                                                                                                                                                                                                                                                                                                                                                            
2022-10-01T18:00:02-0400 [INFO    ] tinuous Fetching runs newer than 2022-09-30 21:03:18+00:00                                                                                                                                                                                                                                                                                                                                                                        
2022-10-01T18:00:02-0400 [INFO    ] tinuous Fetching runs for workflow .github/workflows/test.yml (Tests)                                                                                                                                                                                                                                                                                                                                                             
2022-10-01T18:00:04-0400 [INFO    ] tinuous Found run 4227                                                                                                                                                                                                                                                                                                                                                                                                            
2022-10-01T18:00:04-0400 [INFO    ] tinuous Logs for test.yml (Tests) #4227 already downloaded to 2022/10/01/github/cron/20221001T061504/d5ed594/Tests/4227; skipping                                                                                                                                                                                                                                                                                                 
2022-10-01T18:00:04-0400 [INFO    ] tinuous Fetching runs for workflow  (Tests)                                                                                                                                                                                                                                                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                                                    
...                                                                                                                                                                                                                                                                                                                                                                                                   
  File "/mnt/backup/dandi/tinuous-logs/venv/lib/python3.8/site-packages/requests/adapters.py", line 510, in send                                                                                                                                                                                                                                                                                                                                                      
    raise RetryError(e, request=request)                                                                                                                                                                                                                                                                                                                                                                                                                              
requests.exceptions.RetryError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/dandi/dandi-cli/actions/workflows/228028/runs (Caused by ResponseError('too many 500 error responses'))

since so consistent on the same PR -- may be indeed some github issue??? should be investigated and possibly filed with github. I do not think I saw similar problem from any other cron job -- only from datalad and dandi-cli.

The text was updated successfully, but these errors were encountered:

jwodder · 2022-10-03T12:59:47Z

@yarikoptic The error does not appear to have anything to do with specific PRs; rather, it's tied to specific (broken?) workflows which have empty path fields and whose run listings always return 500.

jwodder · 2022-10-03T13:07:05Z

@yarikoptic For filling out the GitHub ticket, I need to know when this started.

yarikoptic · 2022-10-03T13:36:14Z

hm, fun. Well - the dates are in the logs above pretty much since we run those cron jobs quite regularly. In both cases it started on Oct 01. First email failing for dandi-cli:

Date: Sat, 01 Oct 2022 00:13:02 -0400
From: Cron Daemon <[email protected]>
To: [email protected]
Subject: Cron <dandi@drogon> cd /mnt/backup/dandi/tinuous-logs/dandi-cli && chronic flock -n -E 0
/home/dandi/.run/tinuous-dandi-cli.lock /mnt/backup/dandi/tinuous-logs/venv/bin/tinuous fetch

for datalad:

Date: Sat, 01 Oct 2022 02:33:11 -0400
From: Cron Daemon <[email protected]>
To: [email protected]
Subject: Cron <datalad@smaug> chronic flock -n -E 0 /home/datalad/.run/tinuous-datalad.lock
        /mnt/datasets/datalad/ci/logs/tools/cron_job

jwodder · 2022-10-03T13:38:22Z

Ticket filed with GitHub. I associated it with the "con" organization, so you might(?) be able to see it: https://support.github.com/ticket/personal/0/1818408

yarikoptic · 2022-10-03T15:17:50Z

Unfortunately can't see it, but that's ok I guess. Is there anything we need to fix in those workflows triggering this issue and/or any way tinious could workaround it ATM?

jwodder · 2022-10-03T15:19:20Z

@yarikoptic We could make tinuous skip workflows with an empty path, but I'm not entirely sure whether that can legitimately happen.

yarikoptic · 2022-10-03T15:35:47Z

ah gotcha - so you suspect that github returned information about workflows is buggy... ok, let's wait a day for a possible response to that github support ticket you filed.

yarikoptic · 2022-10-05T13:20:58Z

FTR: it seems has started to error out also for datalad-extensions. Let's hope github folks would figure it out and solve soonish.

edit: and also for heudiconv for which I tried to add con/tinuous

yarikoptic · 2022-10-10T14:47:54Z

if the issue is due to github duplicating workflow records in its listing, where the 2nd copy doesn't have path, couldn't we detect duplicate incomplete entries and just skip them? (after some lgr.warning("Detected duplicate buggy entry of workflow %s, a known github issue", workflow.name) or alike for debugging etc)

jwodder · 2022-10-10T14:54:11Z

@yarikoptic

Note that the only sense in which the broken workflows are "duplicates" is that they have the same name field as another workflow; all the other fields differ.
I don't know whether it'll always be the case that broken workflows are duplicates. For one thing, if I use the API to list the workflows for datalad-extensions and heudiconv right now, neither has any workflows with duplicate names or empty paths.

yarikoptic · 2022-10-10T15:02:29Z

I think it is ok to assume that name should be unique, or am I wrong? (i.e. if we have some case where we have the same name for multiple workflows, do we?)
have you reached the crash due to similar 500 issue with datalad-extensions or heudiconv? (for heudiconv I remember getting it only when trying to get it for some older starting date, like in May of this year)

jwodder · 2022-10-10T15:10:41Z

@yarikoptic

I think it is ok to assume that name should be unique, or am I wrong?

It not OK to assume that. The "name" is just whatever's filled in for the "name" key at the top of the workflow file, and it's perfectly possible for two or more workflows to have the same name.

have you reached the crash due to similar 500 issue with datalad-extensions or heudiconv?

Using the below script, I get a 500 when trying to list the runs for the first datalad/datalad-extensions workflow (which has a unique name and a path of .github/workflows/test_extensions.yml, which currently does not exist in the repo), and I get no errors for nipy/heudiconv:

#!/usr/bin/env python3
import click
from ghrepo import GHRepo
import requests

@click.command()
@click.argument("repo", type=GHRepo.parse)
def main(repo):
    with requests.Session() as s:
        r = s.get(f"{repo.api_url}/actions/workflows?per_page=100")
        r.raise_for_status()
        for workflow in r.json()["workflows"]:
            print("Name:", repr(workflow["name"]))
            print("Path:", repr(workflow["path"]))
            r = s.get(workflow["url"] + "/runs")
            if not r.ok:
                print("ERROR:", r.status_code)
            else:
                print("Runs:", r.json()["total_count"])
            print()

if __name__ == "__main__":
    main()

yarikoptic · 2022-10-10T15:43:08Z

interesting, so -- it is 500s on an extension which no longer exists although existed long time ago:

❯ git log -- .github/workflows/test_extensions.yml
commit 36f3042c0f9fbfef3d6cba82e1176a70c967e913
Author: Yaroslav Halchenko <[email protected]>
Date:   Mon Feb 24 12:07:29 2020 -0500

    Remove test_extensions.yml with a matrix for all extensions (to be split)

well beyond the date we care about. So, for each one of those 500s we just need to establish (and ideally cache but we can may be tolerate querying each time for now) the date when it was removed in the repo. is there a github API to provide commits for a path? (I couldn't find quickly)

jwodder · 2022-10-10T15:47:13Z

@yarikoptic I don't see anything obvious. Note that my script is able to retrieve runs for .github/workflows/upload-windows-extras.yaml, .github/workflows/test-datalad_osf.yaml, and some other workflows that no longer exist in the repo. Interestingly, testing the script on some other repositories seems to suggest that workflows that don't exist anymore shouldn't be returned.

yarikoptic · 2022-10-10T15:50:57Z

Interestingly, testing the script on some other repositories seems to suggest that workflows that don't exist anymore shouldn't be returned.

wouldn't that mean that we potentially could loose some logs? e.g. if

run tinuous
workflowx runs
workflowx is removed
run tinuous
we would not be able to fetch logs from that run between tinuous runs?

also -- I checked mail archives - it seems that "Extensions" workflow was returned before even after its removal , e.g.

2021-03-17T12:05:03-0400 [INFO    ] tinuous Fetching runs for workflow test_extensions.yml (Extensions)
2021-03-17T12:05:04-0400 [INFO    ] tinuous Fetching runs for workflow test_macos.yml (Test on macOS)
2021-03-17T12:05:04-0400 [INFO    ] tinuous Fetching logs from travis
2021-03-17T12:05:04-0400 [INFO    ] tinuous Fetching builds newer than 2021-03-17 02:32:36+00:00

or

2021-03-30T11:00:20-0400 [INFO    ] tinuous Fetching runs newer than 2021-03-30 06:28:07+00:00
2021-03-30T11:00:20-0400 [INFO    ] tinuous Fetching runs for workflow test_crippled.yml (CrippledFS)
2021-03-30T11:00:21-0400 [INFO    ] tinuous Fetching runs for workflow test_extensions.yml (Extensions)
2021-03-30T11:00:21-0400 [INFO    ] tinuous Fetching runs for workflow test_macos.yml (Test on macOS)
2021-03-30T11:00:22-0400 [INFO    ] tinuous Fetching logs from travis

so it seems that they should be returned.

yarikoptic · 2022-10-10T15:55:07Z

hey -- we do have all needed information, kinda:

(Pdb) p workflow
{'id': 605374, 'node_id': 'MDg6V29ya2Zsb3c2MDUzNzQ=', 'name': 'Extensions', 'path': '.github/workflows/test_extensions.yml', 'state': 'active', 'created_at': '2020-02-21T01:31:37.000Z', 'updated_at': '2020-02-21T01:31:37.000Z', 'url': 'https://api.github.com/repos/datalad/datalad-extensions/actions/workflows/605374', 'html_url': 'https://github.com/datalad/datalad-extensions/blob/master/.github/workflows/test_extensions.yml', 'badge_url': 'https://github.com/datalad/datalad-extensions/workflows/Extensions/badge.svg'}

so we have 'updated_at': '2020-02-21T01:31:37.000Z' so

if we get 500
and the file is no longer in the tree

❯ curl -H "Accept: application/vnd.github+json" -H "Authorization: Bearer $(git config hub.oauthtoken)" https://api.github.com/repos/datalad/datalad-extensions/contents/.github/workflows/test_extensions.yml
{
  "message": "Not Found",
  "documentation_url": "https://docs.github.com/rest/reference/repos#get-repository-content"
}

and updated_at is before the date we are interested
then skip that workflow while issuing a warning.

jwodder · 2022-10-10T17:40:00Z

@yarikoptic Detecting whether a 500 error occurred isn't straightforward. We've configured PyGithub to retry all 500 errors (as well as some other 5xx errors) up to 12 times, and only if all attempts fail (after about 12 and a half minutes) is an exception raised. Unfortunately, the requests.RetryError object doesn't seem to store any failed responses or status codes, and the only way to tell what exactly went wrong seems to be to inspect the stringification of the error, which is not robust.

yarikoptic · 2022-10-10T17:54:43Z

We are talking about https://github.com/con/tinuous/blob/master/src/tinuous/github.py#L66 which uses urllib3.util.retry.Retry right? if so, what about -- then you subclass Retry, overload increment, do your check there, and if the workflow to skip is found, raise SkipThisWorkflow (the custom exception you add) and then react on that exception in our code to skip it.

jwodder · 2022-10-10T18:01:09Z

@yarikoptic

overload increment, do your check there

Just check for 500 or for something else? If we only check for 500's, I don't see how that'd be different from not retrying on 500. If we also check details of the workflow at that point, how would increment() be passed anything about the workflow?

yarikoptic · 2022-10-10T19:27:15Z

Just check for 500 or for something else? If we only check for 500's, I don't see how that'd be different from not retrying on 500.

only for 500. It will be different that we will perform more checks for that 500 - that the date is still "of interest etc".

... how would increment() be passed anything about the workflow?

We will have the request URL, wouldn't it carry org/repo/workflow path as part of it within that end-point which is 500ing now?

jwodder · 2022-10-10T19:31:47Z

@yarikoptic

only for 500. It will be different that we will perform more checks for that 500 - that the date is still "of interest etc".

Assuming that increment() raises an error on 500 and the "more checks" are performed by the code that catches the error, how would we make the code retry on 500's?

We will have the request URL, wouldn't it carry org/repo/workflow path as part of it within that end-point which is 500ing now?

We also need the since timestamp in order to know whether a workflow is old enough to ignore.

yarikoptic · 2022-10-10T19:53:16Z

Assuming that increment() raises an error on 500 and the "more checks" are performed by the code that catches the error, how would we make the code retry on 500's?

I envisioned that increment to be overloaded as:

def increment(self, *args, **kwargs):
  ret = super().increment(*args, **kwargs)
  if do_checks_for_the_url_say_it_is_500_to_skip:
     raise SkipThisWorkflow()
  return ret

so it would just perform as usual Retry otherwise, and raise our ad-hoc exception if runs into our use case

We will have the request URL, wouldn't it carry org/repo/workflow path as part of it within that end-point which is 500ing now?

We also need the since timestamp in order to know whether a workflow is old enough to ignore.

we know what workflow we are talking about so we could perform (a possibly duplicate) request to get any information we want to get that since as was shown above, right?

jwodder · 2022-10-10T19:57:08Z

@yarikoptic

we know what workflow we are talking about so we could perform (a possibly duplicate) request to get any information we want to get that since as was shown above, right?

The since timestamp comes from tinuous' statefile, not from the API.

yarikoptic · 2022-10-10T20:19:21Z

The since timestamp comes from tinuous' statefile, not from the API.

so it is in our direct power to access it, although possibly breaking the beauty of the design, right? the ugly workaround to achieve would be to populate some global var or env var accessible to the entire process/all async jobs.

yarikoptic · 2022-10-10T20:20:15Z

Let me describe rationale: I am just trying to get all the logs flowing again. I do hope that github team would fix underlying issue, but as we have no estimate on when/if that would happen, I do not want us to hold our breath.

jwodder · 2022-10-10T20:21:34Z

@yarikoptic I think I have a way to mostly-robustly identify if a workflow run request failed due to a 500, but it still involves waiting to retry all 12 times for 12.5 minutes. Would that be acceptable?

yarikoptic · 2022-10-10T20:38:57Z

I think so, unless it would bring us into some other rate limiting or alike problem.

Warn on & skip workflow runs for certain "broken" GitHub workflows

jwodder · 2022-11-02T16:35:32Z

@yarikoptic GitHub support has replied to my ticket, saying they've fixed the underlying issue. The script I posted above shows no longer shows any problematic workflows for dandi/dandi-cli, datalad/datalad, or datalad/datalad-extensions.

yarikoptic · 2022-11-02T16:48:37Z

cool -- great to see things fixed up! Shouldn't we then revert that added skipping added in https://github.com/con/tinuous/pull/151/files#diff-ae556ebe88a885b67c7348cdd53576ce30426c59a5d439d58919251d8865fbb0R116 ?

jwodder · 2022-11-02T16:50:03Z

@yarikoptic I don't see a need to remove it.

yarikoptic · 2022-11-02T16:53:52Z

Wouldn't then if github gets broken again, we -- as the only force in the universe capable of detecting such cases -- would miss that and not report to github to get it fixed again?

jwodder · 2022-11-02T16:57:15Z

@yarikoptic

I highly doubt we're the only ones who would notice this kind of problem.
The code already emits a warning whenever this sort of problem occurs.
If we removed the code and this happened again, we'd end up having to add the code back in while we waited another month for GitHub to fix it. Such churn could be avoided by just keeping the code in.

yarikoptic · 2022-11-02T17:09:43Z

@yarikoptic

I highly doubt we're the only ones who would notice this kind of problem.

It is ok to doubt, but there is no empirical support for that and we did have to report an issue (so some indirect evidence that issue might have been unknown)

The code already emits a warning whenever this sort of problem occurs.

con/tinuous is typically ran by a cron job with chronic so warnings would not alert anyone in time

If we removed the code and this happened again, we'd end up having to add the code back in while we waited another month for GitHub to fix it. Such churn could be avoided by just keeping the code in.

true but we would trigger GitHub to fix it, hopefully faster that time. Without this churn we might just keep accumulating missing logs until someone mentions the warning or does some archaeological expedition. In an ideal case -- this error does not re-emerge and we just have assurance that we are not missing any log. So I would prefer to revert the skip.

This reverts commit ac3116c, reversing changes made to 655bb1c.

yarikoptic assigned jwodder Oct 2, 2022

yarikoptic mentioned this issue Oct 5, 2022

Dockerfile building fails nipy/heudiconv#595

Closed

2 tasks

yarikoptic mentioned this issue Oct 10, 2022

Problem with Github Token logic - Error 403 (not logged) vs Error 500 (logged) #149

Open

jwodder mentioned this issue Oct 10, 2022

Warn on & skip workflow runs for certain "broken" GitHub workflows #151

Merged

yarikoptic added a commit that referenced this issue Oct 11, 2022

Merge pull request #151 from con/gh-150

ac3116c

Warn on & skip workflow runs for certain "broken" GitHub workflows

yarikoptic closed this as completed in #151 Oct 11, 2022

yarikoptic mentioned this issue Oct 12, 2022

Another 500 in "dandi-api" (now dandi-archive) (only) #152

Closed

jwodder added a commit that referenced this issue Nov 2, 2022

Revert "Merge pull request #151 from con/gh-150"

a4aeee2

This reverts commit ac3116c, reversing changes made to 655bb1c.

jwodder mentioned this issue Nov 2, 2022

Revert PR #151 #157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run started to error out #150

run started to error out #150

yarikoptic commented Oct 2, 2022

jwodder commented Oct 3, 2022

jwodder commented Oct 3, 2022

yarikoptic commented Oct 3, 2022

jwodder commented Oct 3, 2022 •

edited

Loading

yarikoptic commented Oct 3, 2022

jwodder commented Oct 3, 2022

yarikoptic commented Oct 3, 2022

yarikoptic commented Oct 5, 2022 •

edited

Loading

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022 •

edited

Loading

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Nov 2, 2022

yarikoptic commented Nov 2, 2022

jwodder commented Nov 2, 2022

yarikoptic commented Nov 2, 2022

jwodder commented Nov 2, 2022

yarikoptic commented Nov 2, 2022

run started to error out #150

run started to error out #150

Comments

yarikoptic commented Oct 2, 2022

jwodder commented Oct 3, 2022

jwodder commented Oct 3, 2022

yarikoptic commented Oct 3, 2022

jwodder commented Oct 3, 2022 • edited Loading

yarikoptic commented Oct 3, 2022

jwodder commented Oct 3, 2022

yarikoptic commented Oct 3, 2022

yarikoptic commented Oct 5, 2022 • edited Loading

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022 • edited Loading

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Oct 10, 2022

yarikoptic commented Oct 10, 2022

jwodder commented Nov 2, 2022

yarikoptic commented Nov 2, 2022

jwodder commented Nov 2, 2022

yarikoptic commented Nov 2, 2022

jwodder commented Nov 2, 2022

yarikoptic commented Nov 2, 2022

jwodder commented Oct 3, 2022 •

edited

Loading

yarikoptic commented Oct 5, 2022 •

edited

Loading

yarikoptic commented Oct 10, 2022 •

edited

Loading