Asynchronous CSV reading and writing.
pip install aiocsv
. Python 3.8+ is required.
This module contains an extension written in C. Pre-build binaries may not be available for your configuration. You might need a C compiler and Python headers to install aiocsv.
AsyncReader & AsyncDictReader accept any object that has a read(size: int)
coroutine,
which should return a string.
AsyncWriter & AsyncDictWriter accept any object that has a write(b: str)
coroutine.
Reading is implemented using a custom CSV parser, which should behave exactly like the CPython parser.
Writing is implemented using the synchronous csv.writer and csv.DictWriter objects - the serializers write data to a StringIO, and that buffer is then rewritten to the underlying asynchronous file.
Example usage with aiofiles.
import asyncio
import csv
import aiofiles
from aiocsv import AsyncReader, AsyncDictReader, AsyncWriter, AsyncDictWriter
async def main():
# simple reading
async with aiofiles.open("some_file.csv", mode="r", encoding="utf-8", newline="") as afp:
async for row in AsyncReader(afp):
print(row) # row is a list
# dict reading, tab-separated
async with aiofiles.open("some_other_file.tsv", mode="r", encoding="utf-8", newline="") as afp:
async for row in AsyncDictReader(afp, delimiter="\t"):
print(row) # row is a dict
# simple writing, "unix"-dialect
async with aiofiles.open("new_file.csv", mode="w", encoding="utf-8", newline="") as afp:
writer = AsyncWriter(afp, dialect="unix")
await writer.writerow(["name", "age"])
await writer.writerows([
["John", 26], ["Sasha", 42], ["Hana", 37]
])
# dict writing, all quoted, "NULL" for missing fields
async with aiofiles.open("new_file2.csv", mode="w", encoding="utf-8", newline="") as afp:
writer = AsyncDictWriter(afp, ["name", "age"], restval="NULL", quoting=csv.QUOTE_ALL)
await writer.writeheader()
await writer.writerow({"name": "John", "age": 26})
await writer.writerows([
{"name": "Sasha", "age": 42},
{"name": "Hana"}
])
asyncio.run(main())
aiocsv
strives to be a drop-in replacement for Python's builtin
csv module. However, there are 3 notable differences:
- Readers accept objects with async
read
methods, instead of an AsyncIterable over lines from a file. AsyncDictReader.fieldnames
can beNone
- useawait AsyncDictReader.get_fieldnames()
instead.- Changes to
csv.field_size_limit
are not picked up by existing Reader instances. The field size limit is cached on Reader instantiation to avoid expensive function calls on each character of the input.
Other, minor, differences include:
AsyncReader.line_num
,AsyncDictReader.line_num
andAsyncDictReader.dialect
are not settable,AsyncDictReader.reader
is ofAsyncReader
type,AsyncDictWriter.writer
is ofAsyncWriter
type,AsyncDictWriter
provides an extra, read-onlydialect
property.
AsyncReader(
asyncfile: aiocsv.protocols.WithAsyncRead,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that iterates over records in the given asynchronous CSV file. Additional keyword arguments are understood as dialect parameters.
Iterating over this object returns parsed CSV rows (List[str]
).
Methods:
__aiter__(self) -> self
async __anext__(self) -> List[str]
Read-only properties:
dialect
: The csv.Dialect used when parsingline_num
: The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.
AsyncDictReader(
asyncfile: aiocsv.protocols.WithAsyncRead,
fieldnames: Optional[Sequence[str]] = None,
restkey: Optional[str] = None,
restval: Optional[str] = None,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that iterates over records in the given asynchronous CSV file. All arguments work exactly the same was as in csv.DictReader.
Iterating over this object returns parsed CSV rows (Dict[str, str]
).
Methods:
__aiter__(self) -> self
async __anext__(self) -> Dict[str, str]
async get_fieldnames(self) -> List[str]
Properties:
fieldnames
: field names used when converting rows to dictionaries
⚠️ Unlike csv.DictReader, this property can't read the fieldnames if they are missing - it's not possible toawait
on the header row in a property getter. Useawait reader.get_fieldnames()
.reader = csv.DictReader(some_file) reader.fieldnames # ["cells", "from", "the", "header"] areader = aiofiles.AsyncDictReader(same_file_but_async) areader.fieldnames # ⚠️ None await areader.get_fieldnames() # ["cells", "from", "the", "header"]
restkey
: If a row has more cells then the header, all remaining cells are stored under this key in the returned dictionary. Defaults toNone
.restval
: If a row has less cells then the header, then missing keys will use this value. Defaults toNone
.reader
: Underlyingaiofiles.AsyncReader
instance
Read-only properties:
dialect
: Link toself.reader.dialect
- the current csv.Dialectline_num
: The number of lines read from the source file. This coincides with a 1-based index of the line number of the last line of the recently parsed record.
AsyncWriter(
asyncfile: aiocsv.protocols.WithAsyncWrite,
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that writes csv rows to the given asynchronous file. In this object "row" is a sequence of values.
Additional keyword arguments are passed to the underlying csv.writer instance.
Methods:
async writerow(self, row: Iterable[Any]) -> None
: Writes one row to the specified file.async writerows(self, rows: Iterable[Iterable[Any]]) -> None
: Writes multiple rows to the specified file.
Readonly properties:
dialect
: Link to underlying's csv.writer'sdialect
attribute
AsyncDictWriter(
asyncfile: aiocsv.protocols.WithAsyncWrite,
fieldnames: Sequence[str],
restval: Any = "",
extrasaction: Literal["raise", "ignore"] = "raise",
dialect: str | csv.Dialect | Type[csv.Dialect] = "excel",
**csv_dialect_kwargs: Unpack[aiocsv.protocols.CsvDialectKwargs],
)
An object that writes csv rows to the given asynchronous file. In this object "row" is a mapping from fieldnames to values.
Additional keyword arguments are passed to the underlying csv.DictWriter instance.
Methods:
async writeheader(self) -> None
: Writes header row to the specified file.async writerow(self, row: Mapping[str, Any]) -> None
: Writes one row to the specified file.async writerows(self, rows: Iterable[Mapping[str, Any]]) -> None
: Writes multiple rows to the specified file.
Properties:
fieldnames
: Sequence of keys to identify the order of values when writing rows to the underlying filerestval
: Placeholder value used when a key from fieldnames is missing in a row, defaults to""
extrasaction
: Action to take when there are keys in a row, which are not present in fieldnames, defaults to"raise"
which causes ValueError to be raised on extra keys, may be also set to"ignore"
to ignore any extra keyswriter
: Link to the underlyingAsyncWriter
Readonly properties:
dialect
: Link to underlying's csv.reader'sdialect
attribute
A typing.Protocol
describing an asynchronous file, which can be read.
A typing.Protocol
describing an asynchronous file, which can be written to.
Type of the dialect
argument, as used in the csv
module.
Keyword arguments used by csv
module to override the dialect settings during reader/writer
instantiation.
Contributions are welcome, however please open an issue beforehand. aiocsv
is meant as
a replacement for the built-in csv
, any features not present in the latter will be rejected.
To create a wheel (and a source tarball), run python -m build
.
For local development, use a virtual environment.
pip install --editable .
will build the C extension and make it available for the current
venv. This is required for running the tests. However, due to the mess of Python packaging
this will force an optimized build without debugging symbols. If you need to debug the C part
of aiocsv and build the library with e.g. debugging symbols, the only sane way is to
run python setup.py build --debug
and manually copy the shared object/DLL from build/lib*/aiocsv
to aiocsv
.
This project uses pytest with
pytest-asyncio for testing. Run pytest
after installing the library in the manner explained above.
This library uses black and isort for formatting and pyright in strict mode for type checking.
For the C part of library, please use clang-format for formatting and clang-tidy linting, however this are not yet integrated in the CI.
pip install -r requirements.dev.txt
will pull all of the development tools mentioned above,
however this might not be necessary depending on your setup. For example, if you use VS Code
with the Python extension, pyright is already bundled and doesn't need to be installed again.
Use Python, Pylance (should be installed automatically alongside Python extension), black and isort Python extensions.
You will need to install all dev dependencies from requirements.dev.txt
, except for pyright
.
Recommended .vscode/settings.json
:
{
"C_Cpp.codeAnalysis.clangTidy.enabled": true,
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "always"
}
},
"[c]": {
"editor.formatOnSave": true
}
}
For the C part of the library, C/C++ extension is sufficient.
Ensure that your system has Python headers installed. Usually a separate package like python3-dev
needs to be installed, consult with your system repositories on that. .vscode/c_cpp_properties.json
needs to manually include Python headers under includePath
. On my particular system this
config file looks like this:
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"/usr/include/python3.11"
],
"defines": [],
"compilerPath": "/usr/bin/clang",
"cStandard": "c17",
"cppStandard": "c++17",
"intelliSenseMode": "linux-clang-x64"
}
],
"version": 4
}