You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running the distributed/FSDP/T5_training.py example, I encountered an error when loading the wikihow dataset. I would like to know if this is a bug or if there is a way to resolve it.
PyTorch version: 2.7.0a0+git49bdc41
Operating System and version: Ubuntu 20.04
Your Environment
Installed using source? [yes/no]: yes
Are you planning to deploy it using docker container? [yes/no]: no
Is it a CPU or GPU environment?: GPU (CUDA)
Which example are you using: distributed/FSDP/T5_training.py
The script fails with a ConnectionError, indicating that the dataset could not be downloaded from the specified URL.
Possible Solution
The issue might be related to the URL used in the dataset script: https://raw.githubusercontent.com/mahnazkoupaee/WikiHow-Dataset/master/all_train.txt.
If the file is unavailable, an updated URL or alternative data source could resolve the issue.
Steps to Reproduce
Install the requirements:
sh download_dataset.sh
pip install -r requirements.txt
dl_path = dl_manager.download_and_extract(_URLS)
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/download_manager.py", line 220, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/download_manager.py", line 155, in download
downloaded_path_or_paths = map_nested(
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/py_utils.py", line 163, in map_nested
return {
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/py_utils.py", line 164, in <dictcomp>
k: map_nested(
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/py_utils.py", line 191, in map_nested
return function(data_struct)
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/download_manager.py", line 156, in <lambda>
lambda url: cached_path(url, download_config=self._download_config,), url_or_urls,
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/file_utils.py", line 191, in cached_path
output_path = get_from_cache(
File "/home/appuser/.local/lib/python3.10/site-packages/nlp/utils/file_utils.py", line 356, in get_from_cache
raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://raw.githubusercontent.com/mahnazkoupaee/WikiHow-Dataset/master/all_train.txt
Context
While running the
distributed/FSDP/T5_training.py
example, I encountered an error when loading thewikihow
dataset. I would like to know if this is a bug or if there is a way to resolve it.Your Environment
distributed/FSDP/T5_training.py
Expected Behavior
The
wikihow
dataset should be successfully loaded using the following command:Current Behavior
The script fails with a
ConnectionError
, indicating that the dataset could not be downloaded from the specified URL.Possible Solution
The issue might be related to the URL used in the dataset script:
https://raw.githubusercontent.com/mahnazkoupaee/WikiHow-Dataset/master/all_train.txt
.If the file is unavailable, an updated URL or alternative data source could resolve the issue.
Steps to Reproduce
Failure Logs [if any]
full log is here.
Could you provide guidance on how to resolve this issue? Alternatively, if this is a bug, are there any workarounds or fixes available?
Thank you for your help in advance!
The text was updated successfully, but these errors were encountered: