Merge pull request #575 from kahst:extend-docs

Add more guides and audio summaries
kahst · Feb 13, 2025 · 7a37bfa · 7a37bfa
2 parents 1711f8e + 92307ed
commit 7a37bfa
Show file tree

Hide file tree

Showing 8 changed files with 154 additions and 8 deletions.
diff --git a/docs/_static/BirdNET_Guide-Introduction-NotebookLM.mp3 b/docs/_static/BirdNET_Guide-Introduction-NotebookLM.mp3
diff --git a/docs/_static/BirdNET_Guide-Segment_review-NotebookLM.mp3 b/docs/_static/BirdNET_Guide-Segment_review-NotebookLM.mp3
diff --git a/docs/best-practices.rst b/docs/best-practices.rst
@@ -5,4 +5,5 @@ Best practices
    :maxdepth: 2
 
    best-practices/species-lists
+   best-practices/segment-review
    best-practices/training
diff --git a/docs/best-practices/segment-review.rst b/docs/best-practices/segment-review.rst
@@ -0,0 +1,80 @@
+Segment Review
+=================================
+
+Get started by listening to this AI-generated summary of segments review:
+
+.. raw:: html
+
+    <audio controls>
+      <source src="../_static/BirdNET_Guide-Segment_review-NotebookLM.mp3" type="audio/mpeg">
+      Your browser does not support the audio element.
+    </audio>
+
+| 
+| `Source: Google NotebookLM`
+
+1. Prepare Audio and Result Files
+---------------------------------
+
+- | **Collect Audio Recordings and Corresponding BirdNET Result Files**: Organize them into separate folders.
+- | **Result File Formats**: BirdNET-Analyzer typically produces result files with extensions ".BirdNET.txt" or ".BirdNET.csv". It can process various result file formats, including "table", "kaleidoscope", "csv", and "audacity".
+- | **Understanding Confidence Values**: Note that BirdNET confidence values are not probabilities and are not directly transferable between different species or recording conditions.
+
+2. Using the "Segments" Function in the GUI or Command Line
+-----------------------------------------------------------
+
+- | **Segments Function**: BirdNET provides the "segments" function to create a collection of species-specific predictions that exceed a user-defined confidence value. This function is available in the graphical user interface (GUI) under the "segments" tab or via the "segments.py" script in the command line.
+- | **GUI Usage**: In the GUI, you can select audio, result, and output directories. You can also set additional parameters such as the minimum confidence value, the maximum number of segments per species, the audio speed, and the segment length.
+
+3. Setting Parameters
+---------------------
+
+- | **Minimum Confidence (min_conf)**: Set a minimum confidence value for predictions to be considered. Note that this value may vary by species. It is recommended to determine the threshold by reviewing precision and recall.
+- | **Maximum Number of Segments (num_seq)**: Specify how many segments per species should be extracted.
+- | **Audio Speed (audio_speed)**: Adjust the playback speed.
+- | **Segment Length (seq_length)**: Define how long the extracted audio segments should be.
+
+4. Extracting Segments
+----------------------
+
+- | **Start the Extraction Process**: After setting all parameters, start the extraction process. BirdNET will create subfolders for each identified species and save audio clips of the corresponding recordings.
+- | **Progress Display**: The progress of the process will be displayed.
+
+5. Reviewing Results
+--------------------
+
+- | **Manual Review of Audio Segments**: The resulting audio segments can be manually reviewed to assess the accuracy of the predictions. It is important to note that BirdNET confidence values are not probabilities but a measure of the algorithm's prediction reliability.
+- | **Systematic Review**: It is recommended to start with the highest confidence scores and work down to the lower scores.
+- | **File Naming**: Files are named with confidence values, allowing for sorting by values.
+
+6. Using the Review Tab in the GUI
+----------------------------------
+
+- | **Review Tab Overview**: The review tab in the GUI allows you to systematically review and label the extracted segments. It provides tools for visualizing spectrograms, listening to audio segments, and categorizing them as positive or negative detections.
+- | **Collect Segments**: Use the review tab to collect segments from the specified directory. You can shuffle the segments for a randomized review process.
+- | **Create Log Plot**: The review tab can generate a logistic regression plot to visualize the relationship between confidence values and the likelihood of correct detections.
+- **Review Process**:
+
+  - | **Select Directory**: Choose the directory containing the segments to be reviewed.
+  - | **Species Dropdown**: Select the species to review from the dropdown menu.
+  - | **File Count Matrix**: View the count of files to be reviewed, positive detections, and negative detections.
+  - | **Spectrogram and Audio**: Visualize the spectrogram and listen to the audio segment.
+  - | **Label Segments**: Use the buttons to label segments as positive or negative detections.
+  - | **Undo**: Undo the last action if needed.
+  - | **Download Plots**: Download the spectrogram and regression plots for further analysis.
+
+7. Alternative Approaches
+-------------------------
+
+- | **Raven Pro**: BirdNET result tables can be imported into Raven Pro and reviewed using the selection review function.
+- | **Converting Confidence Values to Probabilities**: Another approach is converting confidence values to probabilities using logistic regression. However, this requires manual evaluation of predictions in a sample.
+
+8. Important Notes
+------------------
+
+- | **Non-Transferability of Confidence Values**: BirdNET confidence values are not easily transferable between species.
+- | **Audio Quality**: The accuracy of results heavily depends on the quality of audio recordings, such as sample rate and microphone quality.
+- | **Environmental Factors**: Results can be influenced by the recording environment, such as wind or rain.
+- | **Standardized Test Data**: Using standardized test data for evaluation is important to make results comparable.
+
+This guide summarizes the best practices for using the "segments" function of BirdNET-Analyzer and emphasizes the need for careful interpretation of the results.
diff --git a/docs/best-practices/species-lists.rst b/docs/best-practices/species-lists.rst
@@ -1,10 +1,63 @@
-Creating your own species list
-------------------------------
+Creating Your Own Species List
+==============================
 
-When editing your own species_list.txt file, make sure to copy species names from the labels file of each model.
+When editing your own `species_list.txt` file, make sure to copy species names from the labels file of each model.
 
-You can find label files in the checkpoints folder, e.g., checkpoints/V2.4/BirdNET_GLOBAL_6K_V2.4_Labels.txt.
+You can find label files in the checkpoints folder, e.g., `checkpoints/V2.4/BirdNET_GLOBAL_6K_V2.4_Labels.txt`.
 
-Species names need to consist of scientific name_common name to be valid.
+Species names need to consist of `scientific name_common name` to be valid.
 
-You can generate a species list for a given location using :ref:`species.py <cli-species>`.
+You can generate a species list for a given location using :ref:`species.py <cli-species>`.
+
+Practical Information and Considerations
+----------------------------------------
+
+**Understanding the GeoModel**
+
+The BirdNET Species Range Model V2.4 - V2 uses eBird checklist frequency data to estimate the range of bird species and the probability of their occurrence given latitude, longitude, and week of the year. eBird relies on citizen scientists to collect bird species observations around the world. Due to biases in these data, some regions such as North and South America, Europe, India, and Australia are well represented in the data, while large parts of Africa or Asia are underrepresented.
+
+In cases where eBird does not have enough observations (i.e., checklists), the data "only" contain binary filter data of likely species that could occur in a given location. Therefore, the training data for our biodiversity model is a mixture of actual observations and filter data curated by experts. We included all locations for which at least 10 checklists are available for each week of the year, and randomly added other locations with a 3% probability.
+
+**Limitations of the GeoModel**
+
+- **Data Coverage**: The model works well in regions with good eBird data coverage, such as North and South America, Europe, India, and Australia. In other regions, the lack of eBird observations means the resulting species lists may not reflect actual probabilities of occurrence.
+- **Binary Filter Data**: In areas with insufficient eBird data, the model relies on binary filter data, which may not be as accurate as actual observations.
+- **Seasonal Variations**: The model accounts for seasonal variations in bird presence, but the accuracy depends on the availability of data for each week of the year.
+
+**Creating Custom Species Lists**
+
+If you know which species to expect in your area, it is recommended to compile your own species list. This can help improve the accuracy of BirdNET-Analyzer for your specific use case.
+
+1. **Collect Species Names**: Use the labels file from the model checkpoints to get the correct species names. Ensure the names are in the format `scientific name_common name`.
+2. **Generate Species List**: Use the `species.py` script to generate a species list for a given location and time. This script uses the GeoModel to predict species occurrence based on latitude, longitude, and week of the year.
+
+**Example of Training Data**
+
+Here is an example of what the training data for a given location (Chemnitz) looks like:
+
+.. code:: python
+
+    'gretit1': [72, 90, 98, 93, 96, 88, 95, 94, 99, 99, 93, 92, 90, 96, 85, 97, 89, 78, 67, 68, 48, 39, 35, 40, 49, 49, 49, 51, 48, 55, 55, 73, 60, 64, 62, 63, 72, 72, 72, 67, 66, 80, 63, 74, 67, 76, 88, 70], 
+    'carcro1': [62, 81, 83, 82, 85, 75, 90, 75, 83, 80, 76, 80, 84, 90, 72, 73, 83, 67, 70, 75, 54, 48, 42, 55, 51, 53, 55, 49, 55, 53, 55, 62, 57, 55, 66, 69, 63, 65, 69, 63, 59, 74, 61, 63, 76, 79, 69, 60], 
+    'eurbla': [55, 80, 84, 92, 71, 70, 72, 84, 85, 86, 82, 95, 88, 92, 86, 91, 90, 75, 87, 81, 84, 72, 69, 62, 67, 70, 57, 66, 55, 56, 49, 32, 36, 37, 41, 49, 55, 62, 57, 58, 41, 37, 58, 67, 69, 64, 69, 49], 
+    'blutit': [67, 83, 92, 93, 96, 83, 87, 93, 96, 90, 82, 80, 84, 88, 58, 79, 74, 52, 46, 36, 34, 29, 25, 26, 39, 43, 36, 43, 47, 42, 49, 48, 49, 51, 45, 52, 61, 64, 55, 55, 65, 72, 62, 71, 66, 67, 69, 64], 
+    'grswoo': [61, 84, 80, 80, 90, 83, 85, 77, 76, 82, 72, 77, 77, 78, 64, 76, 81, 69, 73, 75, 66, 44, 46, 41, 47, 41, 38, 44, 42, 42, 52, 68, 37, 35, 38, 43, 44, 41, 43, 41, 49, 61, 41, 49, 48, 47, 67, 47], 
+    'cowpig1': [9, 10, 3, 3, 16, 16, 30, 54, 65, 61, 69, 76, 83, 81, 80, 86, 80, 71, 68, 78, 68, 69, 79, 68, 76, 69, 69, 79, 70, 70, 68, 73, 64, 63, 58, 54, 53, 49, 53, 56, 44, 21, 33, 38, 45, 43, 5, 11],
+    'eurnut2': [43, 76, 88, 82, 79, 78, 91, 84, 92, 86, 76, 77, 75, 85, 69, 75, 60, 34, 47, 58, 34, 24, 33, 33, 31, 23, 28, 25, 23, 21, 23, 52, 26, 26, 31, 28, 25, 29, 32, 23, 47, 46, 24, 31, 30, 36, 61, 53], 
+    'comcha': [26, 33, 30, 33, 34, 34, 39, 48, 70, 75, 80, 83, 80, 90, 76, 85, 80, 74, 77, 74, 59, 52, 51, 40, 34, 44, 33, 31, 22, 15, 17, 21, 17, 18, 26, 34, 44, 48, 53, 49, 31, 27, 33, 39, 44, 39, 30, 28]
+
+**Example of Model Predictions**
+
+If we query the trained model for the same location as above, we get these values for great tits:
+
+.. code:: python
+
+    'gretit': [99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 98, 98, 98, 98, 98, 97, 97, 97, 97, 97, 97, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99]
+
+**Conclusion**
+
+Overall, the model works well in regions with good data coverage. In other regions, the lack of eBird observations means the resulting species lists may not reflect actual probabilities of occurrence. Nevertheless, these lists can be used to filter for species that may or may not occur in these locations.
+
+By understanding the limitations and capabilities of the GeoModel, you can make informed decisions when creating and using custom species lists for BirdNET-Analyzer.
+
+See this post in the discussion forum for more details: `Species range model details <https://github.com/kahst/BirdNET-Analyzer/discussions/234>`_
diff --git a/docs/best-practices/training.rst b/docs/best-practices/training.rst
@@ -1,4 +1,4 @@
-Best Practices for Training Custom Classifiers
+Training Custom Classifiers
 ==============================================
 
 Get started by listening to this AI-generated summary of training custom classifiers with BirdNET embeddings:

diff --git a/docs/conf.py b/docs/conf.py
@@ -13,7 +13,7 @@
 sys.path.insert(1, os.path.abspath(".."))
 
 project = "BirdNET-Analyzer"
-copyright = "%Y, Stefan Kahl"
+copyright = "%Y, BirdNET-Team"
 author = "Stefan Kahl"
 version = "1.5.1"
 html_favicon = "_static/birdnet-icon.ico"

diff --git a/docs/index.rst b/docs/index.rst
@@ -19,6 +19,18 @@ Introduction
 
 BirdNET-Analyzer is an open source tool for analyzing bird calls using machine learning models. It can process large amounts of audio recordings and identify (bird) species based on their calls.
 
+Get started by listening to this AI-generated introduction of the BirdNET-Analyzer:
+
+.. raw:: html
+
+    <audio controls>
+      <source src="../_static/BirdNET_Guide-Introduction-NotebookLM.mp3" type="audio/mpeg">
+      Your browser does not support the audio element.
+    </audio>
+
+| 
+| `Source: Google NotebookLM`
+
 Citing BirdNET-Analyzer
 -----------------------