diff --git a/CHANGELOG.md b/CHANGELOG.md index b140c5a..c1c74a5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,7 +1,9 @@ # Changelog ## v0.4.0 - unreleased - +- Added documentation files for readthedocs ([#134](https://github.com/mpytools/filefisher/pull/134)) + and extended the documentation with usage ([#135](https://github.com/mpytools/filefisher/pull/135)), + and installation instructions ([#136](https://github.com/mpytools/filefisher/pull/136)) - Added two methods to find _exactly_ one file or path (and raise an error otherwise): `FileFinder.find_single_file` and `FileFinder.find_single_path` ([#101](https://github.com/mpytools/filefisher/pull/101)). diff --git a/README.md b/README.md index 87d6ef2..9acdfa9 100644 --- a/README.md +++ b/README.md @@ -2,182 +2,19 @@ _A handy tool to find and parse file and folder names._ -Define regular folder and file patterns with the intuitive python syntax: +Filefisher is a python package that can create or find file and folder names according +to a pattern you define. This is handy if you have many folders or files that follow a +predefined naming structure. -```python -from filefisher import FileFinder +## Documentation -path_pattern = "/root/{category}" -file_pattern = "{category}_file_{number}" +Learn more about filefisher in it's official documentation at https://filefisher.readthedocs.io/ -ff = FileFinder(path_pattern, file_pattern) -``` +## Get in touch -## Create file and path names +Don't hesitate to ask usage questions, report bugs, suggest features or view the source +code on GitHub under [mpytools/filefisher](https://github.com/mpytools/filefisher). -Everything enclosed in curly brackets is a placeholder. Thus, you can create file and -path names like so: +## License -```python -ff.create_path_name(category="a") ->>> /root/a/ - -ff.create_file_name(category="a", number=1) ->>> a_file_1 - -ff.create_full_name(category="a", number=1) ->>> /root/a/a_file_1 -``` - -## Find files on disk - -However, the strength of filefisher is parsing file names on disk. Assuming you have the -following folder structure: - -``` -/root/a1/a1_file_1 -/root/a1/a1_file_2 -/root/b2/b2_file_1 -/root/b2/b2_file_2 -/root/c3/c3_file_1 -/root/c3/c3_file_2 -``` - -You can then look for paths: - -```python -ff.find_paths() ->>> ->>> filename category ->>> 0 /root/a1/* a1 ->>> 1 /root/b2/* b2 ->>> 2 /root/c3/* c3 -``` -The placeholders (here `{category}`) is parsed and returned. You can also look for -files: - -```python -ff.find_files() ->>> ->>> filename category number ->>> 0 /root/a1/a1_file_1 a1 1 ->>> 1 /root/a1/a1_file_2 a1 2 ->>> 2 /root/b2/b2_file_1 b2 1 ->>> 3 /root/b2/b2_file_2 b2 2 ->>> 4 /root/c3/c3_file_1 c3 1 ->>> 5 /root/c3/c3_file_2 c3 2 -``` - -It's also possible to filter for certain files: -```python -ff.find_files(category=["a1", "b2"], number=1) ->>> ->>> filename category number ->>> 0 /root/a1/a1_file_1 a1 1 ->>> 2 /root/b2/b2_file_1 b2 1 -``` - -Often we need to be sure to find _exactly one_ file or path. This can be achieved using - -```python -ff.find_single_file(category="a1", number=1) ->>> ->>> filename category number ->>> 0 /root/a1/a1_file_1 a1 1 -``` - -If none or more than one file is found a `ValueError` is raised. - -## Format syntax - -You can pass format specifiers to allow more complex formats, see -[format-specification](https://github.com/r1chardj0n3s/parse#format-specification) for details. -Using format specifiers, you can parse names that are not possible otherwise. - -### Example - -```python -from filefisher import FileFinder - -paths = ["a1_abc", "ab200_abcdef",] - -ff = FileFinder("", "{letters:l}{num:d}_{beg:2}{end}", test_paths=paths) - -fc = ff.find_files() - -fc -``` - -which results in the following: - -```python - - filename letters num beg end -0 a1_abc a 1 ab c -1 ab200_abcdef ab 200 ab cdef -``` - -Note that `fc.df.num` has now a data type of `int` while without the `:d` it would be an -string (or more precisely an object as pandas uses this dtype to represent strings). - - -## Filters - -Filters can postprocess the found paths in ``. Currently only a `priority_filter` -is implemented. - -### Example - -Assuming you have data for several models with different time resolution, e.g., 1 hourly -(`"1h"`), 6 hourly (`"6h"`), and daily (`"1d"`), but not all models have all time resolutions: - -``` -/root/a/a_1h -/root/a/a_6h -/root/a/a_1d - -/root/b/b_1h -/root/b/b_6h - -/root/c/c_1h -``` - -You now want to get the `"1d"` data if available, and then the `"6h"` etc.. This can be achieved with the `priority filter`. Let's first parse the file names: - -```python -ff = FileFinder("/root/{model}", "{model}_{time_res}") - -files = ff.find_files() -files -``` - -which yields: - -``` - - filename model time_res -0 /root/a/a_1d a 1d -1 /root/a/a_1h a 1h -2 /root/a/a_6h a 6h -3 /root/b/b_1h b 1h -4 /root/b/b_6h b 6h -5 /root/c/c_1h c 1h -``` - -We can now apply a `priority_filter` as follows: - -```python -from filefisher.filters import priority_filter - -files = priority_filter(files, "time_res", ["1d", "6h", "1h"]) -files -``` - -Resulting in the desired selection: - -``` - filename model time_res -0 /root/a/a_1d a 1d -1 /root/b/b_6h b 6h -2 /root/c/c_1h c 1h -``` +filefisher is published under a MIT license. diff --git a/docs/source/index.rst b/docs/source/index.rst index 0f93073..71bf8fe 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -11,9 +11,10 @@ .. toctree:: :maxdepth: 2 :hidden: - :caption: For users + :caption: For Users installation + usage .. toctree:: :maxdepth: 2 diff --git a/docs/source/usage.rst b/docs/source/usage.rst new file mode 100644 index 0000000..276456a --- /dev/null +++ b/docs/source/usage.rst @@ -0,0 +1,211 @@ +Usage +===== + +Filefisher is a handy tool to find and parse file and folder names. Here, we show +its basic usage and some example. For more detailed information on the functionalities +presented here, please refer to the `API reference`_. + +Setup +----- + +Define regular folder and file patterns with the intuitive python syntax: + +.. code-block:: python + + from filefisher import FileFinder + + path_pattern = "/root/{category}" + file_pattern = "{category}_file_{number}" + + ff = FileFinder(path_pattern, file_pattern) + + +Create file and path names +-------------------------- + +Everything enclosed in curly brackets is a placeholder. Thus, you can create file and +path names like so: + +.. code-block:: python + + ff.create_path_name(category="a") + >>> /root/a/ + + ff.create_file_name(category="a", number=1) + >>> a_file_1 + + ff.create_full_name(category="a", number=1) + >>> /root/a/a_file_1 + +Find files on disk +------------------ + +However, the strength of filefisher is parsing file names on disk. Assuming you have the +following folder structure: + +.. code-block:: + + /root/a/a_file_1 + /root/a/a_file_2 + /root/b/b_file_1 + /root/b/b_file_2 + /root/c/c_file_1 + /root/c/c_file_2 + +You can then look for paths: + +.. code-block:: python + + ff.find_paths() + >>> + >>> category + >>> path + >>> /root/a/* a + >>> /root/b/* b + >>> /root/c/* c + +The placeholders (here `{category}`) is parsed and returned. You can also look for +files: + +.. code-block:: python + + ff.find_files() + >>> + >>> category number + >>> path + >>> /root/a/a_file_1.rtf a 1 + >>> /root/a/a_file_2.rtf a 2 + >>> /root/b/b_file_1.rtf b 1 + >>> /root/b/b_file_2.rtf b 2 + >>> /root/c/c_file_1.rtf c 1 + >>> /root/c/c_file_2.rtf c 2 + +It's also possible to filter for certain files: + +.. code-block:: python + + ff.find_files(category=["a", "b"], number=1) + >>> + >>> category number + >>> path + >>> /root/a/a_file_1 a 1 + >>> /root/b/b_file_1 b 1 + +Often we need to be sure to find **exactly one** file or path. This can be achieved using + +.. code-block:: python + + ff.find_single_file(category="a", number=1) + >>> + >>> category number + >>> path + >>> /root/a/a_file_1 a 1 + + +If none or more than one file is found a `ValueError` is raised. + +Format syntax +------------- + +You can pass format specifiers to allow more complex formats, see +[format-specification](https://github.com/r1chardj0n3s/parse#format-specification) for details. +Using format specifiers, you can parse names that are not possible otherwise. + +Example +******* + +.. code-block:: python + + from filefisher import FileFinder + + paths = ["a1_abc", "ab200_abcdef",] + + ff = FileFinder("", "{letters:l}{num:d}_{beg:2}{end}", test_paths=paths) + + fc = ff.find_files() + + fc + +which results in the following: + +.. code-block:: python + + + letters num beg end + path + a1_abc a 1 ab c + ab200_abcdef ab 200 ab cdef + + +Note that `fc.df.num` has now a data type of `int` while without the `:d` it would be an +string (or more precisely an object as pandas uses this dtype to represent strings). + + +Filters +------- + +Filters can postprocess the found paths in ``. Currently only a `priority_filter` +is implemented. + +Example +******* + +Assuming you have data for several models with different time resolution, e.g., 1 hourly +(`"1h"`), 6 hourly (`"6h"`), and daily (`"1d"`), but not all models have all time resolutions: + +.. code-block:: + + /root/a/a_1h + /root/a/a_6h + /root/a/a_1d + + /root/b/b_1h + /root/b/b_6h + + /root/c/c_1h + +You now want to get the `"1d"` data if available, and then the `"6h"` etc.. This can be achieved with the `priority filter`. Let's first parse the file names: + +.. code-block:: python + + ff = FileFinder("/root/{model}", "{model}_{time_res}") + + files = ff.find_files() + files + +which yields: + +.. code-block:: + + + model time_res + path + /root/a/a_1d a 1d + /root/a/a_1h a 1h + /root/a/a_6h a 6h + /root/b/b_1h b 1h + /root/b/b_6h b 6h + /root/c/c_1h c 1h + +We can now apply a `priority_filter` as follows: + +.. code-block:: python + + from filefisher.filters import priority_filter + + files = priority_filter(files, "time_res", ["1d", "6h", "1h"]) + files + +Resulting in the desired selection: + +.. code-block:: + + + model time_res + path + /root/a/a_1d a 1d + /root/b/b_6h b 6h + /root/c/c_1h c 1h + + +.. _API reference: API.html \ No newline at end of file