Skip to content

Commit

Permalink
initial pass at extracting Abbott-Smith headwords
Browse files Browse the repository at this point in the history
  • Loading branch information
jtauber committed Nov 23, 2015
1 parent eacc8db commit 1df68f7
Show file tree
Hide file tree
Showing 4 changed files with 5,847 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Currently (distinctly) working on...

- projects/nominal_distinguishers

## Merge Abbott Smith

- projects/merge_abbott_smith

## Citation Form Patterns

- projects/citation_forms
Expand Down
8 changes: 8 additions & 0 deletions projects/merge_abbott_smith/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
`abbott-smith.tei.xml` is from https://github.com/translatable-exegetical-tools/Abbott-Smith

`extract_headwords.py` extracts information from `abbott-smith.tei.xml` to produce `abbott_smith_headwords.txt`

`abbott_smith_headwords.txt` is a pipe-delimited file with three fields:
* the lemma from the `entry` element's `n` attribute
* the Strong's number from the `entry` element's `n` attribute
* the text of the `form` element with all other markup stripped
Loading

0 comments on commit 1df68f7

Please sign in to comment.