Skip to content

Commit

Permalink
Rename functions to use 'dict' terminology rather than 'json'
Browse files Browse the repository at this point in the history
  • Loading branch information
pfmoore committed Jul 13, 2022
1 parent 07af77e commit 47f4a83
Show file tree
Hide file tree
Showing 7 changed files with 65 additions and 62 deletions.
65 changes: 34 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,44 +11,47 @@ distribution) the metadata is saved in a format which is based on email
headers.

This library transforms that metadata to and from a JSON-compatible
form, as defined in [PEP 566](https://peps.python.org/pep-0566/#json-compatible-metadata).
The JSON form is easier to use in a programming context. Two functions
are provided:
dictionary form, as defined in [PEP 566](https://peps.python.org/pep-0566/#json-compatible-metadata).
The dictionary form is easier to use in a programming context. Three
functions are provided:

* `msg_to_json(msg)` - convert the email header format to JSON.
The `msg` argument is the metadata in email format, as an
`email.message.Message` object. Returns a dictionary following
the layout in the "json" form.
* `json_to_msg(json)` - convert the JSON form back to email headers.
The `json` argument is a dictionary following the "json" form.
Returns a (Unicode) string with the message form.
* `bytes_to_dict(bytes)` - convert a byte string containing metadata in the
standard email format to dictionary format. Returns the metadata in
dictionary form.
* `msg_to_dict(msg)` - convert an `emai.message.Message` object containing
metadata to dictionary format. Returns the metadata in dictionary form.
* `dict_to_bytes(dict)` - convert the dictionary form back to email headers.
Returns a byte string with the message form.

Note the discrepancy between the two `msg` forms: a `Message` object
and a string. This is something that may change, as it's a bit of
an awkward discrepancy, but there are reasons for this approach:
Note that the email header format specifies that metadata must be encoded
in UTF-8. The `dict_to_bytes` function enforces this by returning a UTF-8
encoded byte string. The `bytes_to_dict` function, on the other hand, will
attempt to handle input that is not encoded in UTF-8, as older metadata
writers did not enforce UTF-8. The encoding detection is relatively primitive,
and attempting to do anything with non-UTF-8 fields other than write them
back out unmodified is likely to result in mojibake.

1. When reading metadata, the file is supposed to be in the UTF-8
encoding, but historically this has not always been the case.
By using a `Message` as the input, this can be constructed from
either text or bytes (`message_from_string` or `message_from_bytes`)
which allows the email package to handle encoding issues. If a
project uses non-UTF8 metadata, it's likely that this approach will
result in mojibake, but at least the data will be usable.
2. When writing metadata, using a `Message` object results in unwanted
header fields, because the object assumes this is a "real" email,
and not just data re-using that format. So it is more reliable
to simply return the output in string format. It can then be written
to a file (in UTF-8) as required.
Also, while there is a `msg_to_dict` function, there is no corresponding
`dict_to_msg` function. This is because the `email.message.Message` class
does not serialise to bytes in a form that conforms to the metadata spec,
so it is not useful to have metadata converted back to that form.

An example of using the library:

```python
with open(metadata_file, "r", encoding="utf-8") as f:
metadata = pkg_metadata.bytes_to_dict(metadata_file.read_bytes())
metadata["keywords"] = ["example", "artificial"]
metadata_file.write_bytes(dict_to_bytes(metadata))
```

or, using an intermediate `Message` object:

```python
with metadata_file.open(encoding="utf-8") as f:
msg = email.message_from_file(f)
metadata = pkg_metadata.msg_to_json(msg)
metadata = pkg_metadata.msg_to_dict(msg)
metadata["keywords"] = ["example", "artificial"]
with open(metadata_file, "w", encoding="utf-8") as f:
f.write(json_to_msg(metadata))
metadata_file.write_bytes(dict_to_bytes(metadata))
```

In addition to the metadata file format, project metadata can
Expand All @@ -58,7 +61,7 @@ This library provides a function to read the `[project]` section
of `pyproject.toml` and convert it into a ("JSON format") metadata
dictionary.

* `pyproject_to_json(pyproject)` - convert `pyproject.toml` metadata
* `pyproject_to_dict(pyproject)` - convert `pyproject.toml` metadata
into a metadata dictionary. The `pyproject` argument is a dictionary
representing the data in the `[project]` section of `pyproject.toml`.

Expand All @@ -68,5 +71,5 @@ Example:
with open("pyproject.toml", "rb") as f:
pyproject_data = tomli.load(f)

metadata = pkg_metadata.pyproject_to_json(pyproject_data["project"])
metadata = pkg_metadata.pyproject_to_dict(pyproject_data["project"])
```
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Basic Usage

.. code-block:: python
metadata = bytes_to_json(Path("METADATA").read_bytes())
metadata = bytes_to_dict(Path("METADATA").read_bytes())
print(metadata["name"], metadata["version"])
Contents
Expand Down
4 changes: 2 additions & 2 deletions docs/source/pkg_metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Usage

.. doctest::

>>> from pkg_metadata import bytes_to_json
>>> from pkg_metadata import bytes_to_dict
>>> metadata_bytes = b"""\
... Metadata-Version: 2.1
... Name: pkg_metadata
Expand All @@ -24,7 +24,7 @@ Usage
...
... Some description
... """
>>> metadata = bytes_to_json(metadata_bytes)
>>> metadata = bytes_to_dict(metadata_bytes)
>>> metadata["name"]
'pkg_metadata'
>>> metadata["description"]
Expand Down
8 changes: 4 additions & 4 deletions manual_test.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import sqlite3
import zlib
from email import message_from_bytes, message_from_string
from email import message_from_bytes

import pkg_metadata

Expand All @@ -24,9 +24,9 @@ def match(d1, d2):
meta = zlib.decompress(meta)
msg = message_from_bytes(meta)
try:
j = pkg_metadata.msg_to_json(msg)
m = pkg_metadata.json_to_msg(j)
j2 = pkg_metadata.msg_to_json(message_from_string(m))
j = pkg_metadata.msg_to_dict(msg)
m = pkg_metadata.dict_to_bytes(j)
j2 = pkg_metadata.msg_to_dict(message_from_bytes(m))
if j.get("description", "xxx") == "" and "description" not in j2:
j2["description"] = ""
match(j, j2)
Expand Down
4 changes: 2 additions & 2 deletions src/pkg_metadata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@

__version__ = "0.3"

from .metadata import bytes_to_json, json_to_bytes, msg_to_json, pyproject_to_json
from .metadata import bytes_to_dict, dict_to_bytes, msg_to_dict, pyproject_to_dict

__all__ = ["bytes_to_json", "msg_to_json", "json_to_bytes", "pyproject_to_json"]
__all__ = ["bytes_to_dict", "msg_to_dict", "dict_to_bytes", "pyproject_to_dict"]
18 changes: 9 additions & 9 deletions src/pkg_metadata/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
from packaging.requirements import Requirement

__all__ = [
"bytes_to_json",
"msg_to_json",
"json_to_bytes",
"pyproject_to_json",
"bytes_to_dict",
"msg_to_dict",
"dict_to_bytes",
"pyproject_to_dict",
]

METADATA_FIELDS = [
Expand Down Expand Up @@ -48,7 +48,7 @@ def json_name(field: str) -> str:
return field.lower().replace("-", "_")


def bytes_to_json(meta: bytes) -> Dict[str, Any]:
def bytes_to_dict(meta: bytes) -> Dict[str, Any]:
"""Convert header format into a JSON compatible dictionary.
The input should be a byte string in the standard "email header" format.
Expand All @@ -58,10 +58,10 @@ def bytes_to_json(meta: bytes) -> Dict[str, Any]:
other than UTF-8 or Latin1.
"""
msg = message_from_bytes(meta)
return msg_to_json(msg)
return msg_to_dict(msg)


def msg_to_json(msg: Message) -> Dict[str, Any]:
def msg_to_dict(msg: Message) -> Dict[str, Any]:
"""Convert a Message object into a JSON-compatible dictionary."""

def sanitise_header(h) -> str:
Expand Down Expand Up @@ -120,7 +120,7 @@ def rfc822_escape(header: str) -> str:
return sep.join(lines)


def json_to_bytes(metadata: Dict[str, Any]) -> str:
def dict_to_bytes(metadata: Dict[str, Any]) -> bytes:
"""Convert a JSON-compatible dictionary to header format."""
# Build the output by hand, as the email module adds
# extra headers, relevant to email, which don't conform
Expand Down Expand Up @@ -151,7 +151,7 @@ def json_to_bytes(metadata: Dict[str, Any]) -> str:
return msg.encode("UTF-8")


def pyproject_to_json(pyproject: Dict[str, Any]) -> Dict[str, Any]:
def pyproject_to_dict(pyproject: Dict[str, Any]) -> Dict[str, Any]:
"""Read metadata from the [project] section of pyproject.toml.
The input should be a dictionary in the format specified for the [project]
Expand Down
26 changes: 13 additions & 13 deletions tests/test_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import pytest

from pkg_metadata import bytes_to_json, json_to_bytes, msg_to_json, pyproject_to_json
from pkg_metadata import bytes_to_dict, dict_to_bytes, msg_to_dict, pyproject_to_dict

sample = {
"metadata_version": "2.2",
Expand All @@ -15,15 +15,15 @@


def test_roundtrip():
m = json_to_bytes(sample)
j = bytes_to_json(m)
m = dict_to_bytes(sample)
j = bytes_to_dict(m)
assert j == sample


def test_binary_utf8_msg():
txt_msg = ["Name: test", "Version: 0.1", "Description: Un éxample, garçon!"]
bin_msg = "\n".join(txt_msg).encode("utf-8")
j = bytes_to_json(bin_msg)
j = bytes_to_dict(bin_msg)
assert j["name"] == "test"
assert j["version"] == "0.1"
assert j["description"] == "Un éxample, garçon!"
Expand All @@ -32,15 +32,15 @@ def test_binary_utf8_msg():
def test_binary_latin1_msg():
txt_msg = ["Name: test", "Version: 0.1", "Description: Un éxample, garçon!"]
bin_msg = "\n".join(txt_msg).encode("latin1")
j = bytes_to_json(bin_msg)
j = bytes_to_dict(bin_msg)
assert j["name"] == "test"
assert j["version"] == "0.1"
assert j["description"] == "Un éxample, garçon!"


def test_keywords():
msg = ["Name: test", "Version: 0.1", "Keywords: one two three"]
j = msg_to_json(message_from_string("\n".join(msg)))
j = msg_to_dict(message_from_string("\n".join(msg)))
assert j["name"] == "test"
assert j["version"] == "0.1"
assert j["keywords"] == ["one", "two", "three"]
Expand Down Expand Up @@ -70,7 +70,7 @@ def test_pyproject():
],
},
}
j = pyproject_to_json(pyproject)
j = pyproject_to_dict(pyproject)
assert j["name"] == "test"
assert j["version"] == "0.1"
assert j["description"] == "Example readme"
Expand Down Expand Up @@ -101,7 +101,7 @@ def test_pyproject_optional_deps_only():
],
},
}
j = pyproject_to_json(pyproject)
j = pyproject_to_dict(pyproject)
assert j["name"] == "test"
assert j["version"] == "0.1"
assert j["provides_extra"] == ["test"]
Expand All @@ -120,7 +120,7 @@ def test_pyproject_optional_deps_only():
)
def test_pyproject_readme_file(name, content_type, tmp_path):
(tmp_path / name).write_text("Example")
j = pyproject_to_json(
j = pyproject_to_dict(
{"name": "foo", "version": "1.0", "readme": str(tmp_path / name)}
)
assert j["description"] == "Example"
Expand All @@ -137,7 +137,7 @@ def test_pyproject_readme_file(name, content_type, tmp_path):
)
def test_pyproject_readme_explicit_file(name, content_type, tmp_path):
(tmp_path / name).write_text("Example")
j = pyproject_to_json(
j = pyproject_to_dict(
{
"name": "foo",
"version": "1.0",
Expand All @@ -151,7 +151,7 @@ def test_pyproject_readme_explicit_file(name, content_type, tmp_path):
@pytest.mark.parametrize("encoding", ["utf-8", "latin1"])
def test_pyproject_readme_encoding(encoding, tmp_path):
(tmp_path / "README").write_text("Éxample", encoding=encoding)
j = pyproject_to_json(
j = pyproject_to_dict(
{
"name": "foo",
"version": "1.0",
Expand All @@ -168,7 +168,7 @@ def test_pyproject_readme_encoding(encoding, tmp_path):
def test_pytest_readme_no_type(tmp_path):
(tmp_path / "README.md").write_text("Example")
with pytest.raises(ValueError) as exc:
pyproject_to_json(
pyproject_to_dict(
{
"name": "foo",
"version": "1.0",
Expand All @@ -181,7 +181,7 @@ def test_pytest_readme_no_type(tmp_path):
def test_pytest_readme_text_and_file(tmp_path):
(tmp_path / "README.md").write_text("Example")
with pytest.raises(ValueError) as exc:
pyproject_to_json(
pyproject_to_dict(
{
"name": "foo",
"version": "1.0",
Expand Down

0 comments on commit 47f4a83

Please sign in to comment.