.. currentmodule:: torchdata.datapipes.iter
An iterable-style dataset is an instance of a subclass of IterableDataset that implements the __iter__()
protocol,
and represents an iterable over data samples. This type of datasets is particularly suitable for cases where random
reads are expensive or even improbable, and where the batch size depends on the fetched data.
For example, such a dataset, when called iter(iterdatapipe)
, could return a stream of data reading from a database,
a remote server, or even logs generated in real time.
This is an updated version of IterableDataset
in torch
.
.. autoclass:: IterDataPipe
We have different types of Iterable DataPipes:
- Archive - open and decompress archive files of different formats.
- Augmenting - augment your samples (e.g. adding index, or cycle through indefinitely).
- Combinatorial - perform combinatorial operations (e.g. sampling, shuffling).
- Combining/Splitting - interact with multiple DataPipes by combining them or splitting one to many.
- Grouping - group samples within a DataPipe
- IO - interacting with the file systems or remote server (e.g. downloading, opening, saving files, and listing the files in directories).
- Mapping - apply a given function to each element in the DataPipe.
- Others - perform miscellaneous set of operations.
- Selecting - select specific samples within a DataPipe.
- Text - parse, read, and transform text files and data
These DataPipes help opening and decompressing archive files of different formats.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Bz2FileLoader Decompressor RarArchiveLoader TarArchiveLoader TFRecordLoader WebDataset XzFileLoader ZipArchiveLoader
These DataPipes help to augment your samples.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Cycler Enumerator IndexAdder Repeater
These DataPipes help to perform combinatorial operations.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst InBatchShuffler Sampler Shuffler
These tend to involve multiple DataPipes, combining them or splitting one to many.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Concater Demultiplexer Forker IterKeyZipper MapKeyZipper Multiplexer MultiplexerLongest RoundRobinDemultiplexer SampleMultiplexer UnZipper Zipper ZipperLongest
These DataPipes have you group samples within a DataPipe.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Batcher BucketBatcher Collator Grouper MaxTokenBucketizer UnBatcher
These DataPipes help interacting with the file systems or remote server (e.g. downloading, opening, saving files, and listing the files in directories).
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst AISFileLister AISFileLoader FSSpecFileLister FSSpecFileOpener FSSpecSaver FileLister FileOpener GDriveReader HttpReader HuggingFaceHubReader IoPathFileLister IoPathFileOpener IoPathSaver OnlineReader ParquetDataFrameLoader S3FileLister S3FileLoader Saver
These DataPipes apply a given function to each element in the DataPipe.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst BatchAsyncMapper BatchMapper FlatMapper Mapper ThreadPoolMapper
A miscellaneous set of DataPipes with different functionalities.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst DataFrameMaker EndOnDiskCacheHolder FullSync HashChecker InMemoryCacheHolder IterableWrapper LengthSetter MapToIterConverter OnDiskCacheHolder PinMemory Prefetcher RandomSplitter ShardExpander ShardingFilter ShardingRoundRobinDispatcher
These DataPipes helps you select specific samples within a DataPipe.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst Filter Header Dropper Slicer Flattener
These DataPipes help you parse, read, and transform text files and data.
.. autosummary:: :nosignatures: :toctree: generated/ :template: class_template.rst CSVDictParser CSVParser JsonParser LineReader ParagraphAggregator RoutedDecoder Rows2Columnar StreamReader