Skip to content

Commit

Permalink
docs: LEAP-1657: Add a note to Text tag about \r\n (#6645)
Browse files Browse the repository at this point in the history
Co-authored-by: robot-ci-heartex <[email protected]>
Co-authored-by: Max Tkachenko <[email protected]>
  • Loading branch information
3 people authored Nov 14, 2024
1 parent 92c85c5 commit 525dd19
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 0 deletions.
7 changes: 7 additions & 0 deletions docs/source/tags/text.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ Every space in the text sample is counted when calculating result offsets, for e

Use with the following data types: text.

### How to read my text files in python?
The Label Studio editor counts `\r\n` as two different symbols, displaying them as `\n\n`, making it look like there is extra margin between lines.
You should either preprocess your files to replace `\r\n` with `\n` completely, or open files in Python with `newline=''` to avoid converting `\r\n` to `\n`:
`with open('my-file.txt', encoding='utf-8', newline='') as f: text = f.read()`
This is especially important when you are doing span NER labeling and need to get the correct offsets:
`text[start_offset:end_offset]`

### Parameters

| Param | Type | Default | Description |
Expand Down
8 changes: 8 additions & 0 deletions web/libs/editor/src/tags/object/Text.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@
* Every space in the text sample is counted when calculating result offsets, for example for NER labeling tasks.
*
* Use with the following data types: text.
*
* ### How to read my text files in python?
* The Label Studio editor counts `\r\n` as two different symbols, displaying them as `\n\n`, making it look like there is extra margin between lines.
* You should either preprocess your files to replace `\r\n` with `\n` completely, or open files in Python with `newline=''` to avoid converting `\r\n` to `\n`:
* `with open('my-file.txt', encoding='utf-8', newline='') as f: text = f.read()`
* This is especially important when you are doing span NER labeling and need to get the correct offsets:
* `text[start_offset:end_offset]`
*
* @example
* <!--Labeling configuration to label text for NER tasks with a word-level granularity -->
* <View>
Expand Down

0 comments on commit 525dd19

Please sign in to comment.