Skip to content

Commit 7fa1f1c

Browse files
authored
Index records (woudc#1)
1 parent 97e97b0 commit 7fa1f1c

19 files changed

+972
-160
lines changed

.travis.yml

+12-8
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,18 @@ python:
55

66
env:
77
global:
8-
- DEBUG=True
9-
- DB_TYPE=postgresql
10-
- DB_HOST=localhost
11-
- DB_PORT=5432
12-
- DB_NAME=woudc-data-registry
13-
- DB_USERNAME=postgres
14-
- DB_PASSWORD=postgres
8+
- WDR_DEBUG=True
9+
- WDR_DB_TYPE=postgresql
10+
- WDR_DB_HOST=localhost
11+
- WDR_DB_PORT=5432
12+
- WDR_DB_NAME=woudc-data-registry
13+
- WDR_DB_USERNAME=postgres
14+
- WDR_DB_PASSWORD=postgres
1515
- PGPASSWORD=postgres
16+
- WDR_SEARCH_TYPE=elasticsearch
17+
- WDR_SEARCH_URL=http://localhost:9200/
18+
- WDR_WAF_BASEURL=https://woudc.org/archive/
19+
- WDR_WAF_BASEDIR=/tmp
1620

1721
addons:
1822
apt:
@@ -35,7 +39,7 @@ script:
3539
- python setup.py --long-description | rst2html5.py
3640

3741
after_success:
38-
- coverage run --source=woudc_data_registry -m unittest woudc_data_registry.tests.run_tests
42+
- coverage run --source=woudc_data_registry -m unittest woudc_data_registry.tests.test_data_registry
3943
- coverage report -m
4044
- coveralls
4145
- python setup.py bdist_wheel

Makefile

+15-2
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ help:
5353
@echo
5454
@echo " createdb: create PostgreSQL/PostGIS database"
5555
@echo " dropdb: drop PostgreSQL/PostGIS database"
56+
@echo " setup: create models and search index"
57+
@echo " teardown: delete models and search index"
5658
@echo " test: run tests"
5759
@echo " coverage: run code coverage"
5860
@echo " package: create Python wheel"
@@ -72,7 +74,7 @@ clean:
7274
rm -fr debian/woudc-data-registry
7375

7476
coverage:
75-
coverage run --source=woudc_data_registry -m unittest woudc_data_registry.tests.run_tests
77+
coverage run --source=woudc_data_registry -m unittest woudc_data_registry.tests.test_data_registry
7678
coverage report -m
7779

7880
createdb:
@@ -82,10 +84,21 @@ createdb:
8284
dropdb:
8385
dropdb $(PG_FLAGS)
8486

87+
flake8:
88+
flake8 woudc_data_registry
89+
8590
package:
8691
python setup.py sdist bdist_wheel
8792

93+
setup:
94+
woudc-data-registry manage setup
95+
woudc-data-registry search create_index
96+
97+
teardown:
98+
woudc-data-registry manage teardown
99+
woudc-data-registry search delete_index
100+
88101
test:
89102
python setup.py test
90103

91-
.PHONY: clean coverage createdb dropdb help package test
104+
.PHONY: clean coverage createdb dropdb flake8 help package setup teardown test

README.md

+18-7
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,17 @@
77

88
WOUDC Data Registry is a platform that manages Ozone and Ultraviolet
99
Radiation data in support of the [World Ozone and Ultraviolet Radiation Data
10-
Centre (WOUDC)](http://woudc.org), one of six World Data Centres as part of
10+
Centre (WOUDC)](https://woudc.org), one of six World Data Centres as part of
1111
the [Global Atmosphere Watch](http://www.wmo.int/gaw) programme of the
1212
[WMO](http://www.wmo.int).
1313

1414

1515
## Installation
1616

1717
### Requirements
18-
- Python 3 and above
18+
- [Python](https://www.python.org) 3 and above
1919
- [virtualenv](https://virtualenv.pypa.io/)
20+
- [Elasticsearch](https://www.elastic.co/products/elasticsearch) (5.5.0 and above)
2021

2122
### Dependencies
2223
Dependencies are listed in [requirements.txt](requirements.txt). Dependencies
@@ -48,13 +49,23 @@ make ENV=foo.env createdb
4849
make ENV=foo.env dropdb
4950

5051
# initialize model (database tables)
51-
woudc-data-registry model setup
52+
woudc-data-registry manage setup
53+
54+
# initialize search engine
55+
woudc-data-registry search create_index
56+
57+
# load core metadata
58+
woudc-data-registry manage init
5259

5360
# cleanups
5461

5562
# re-initialize model (database tables)
56-
woudc-data-registry model teardown
57-
woudc-data-registry model setup
63+
woudc-data-registry manage teardown
64+
woudc-data-registry manage setup
65+
66+
# re-initialize search engine
67+
woudc-data-registry search delete_index
68+
woudc-data-registry search create_index
5869

5970
# drop database
6071
make ENV=foo.env dropdb
@@ -86,13 +97,13 @@ pip install -r requirements-dev.txt
8697

8798
# run tests like this:
8899
cd woudc_data_registry/tests
89-
python run_tests.py
100+
python test_data_registry.py
90101

91102
# or this:
92103
python setup.py test
93104

94105
# measure code coverage
95-
coverage run --source=woudc_data_registry -m unittest woudc_data_registry.tests.run_tests
106+
coverage run --source=woudc_data_registry -m unittest woudc_data_registry.tests.test_data_registry
96107
coverage report -m
97108
```
98109

debian/control

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ Vcs-Git: https://github.com/woudc/woudc-data-registry.git
99

1010
Package: woudc-data-registry
1111
Architecture: all
12-
Depends: python3-click, python-geoalchemy2, python3-psycopg2, python3-requests, python3-six, python3-sqlalchemy
13-
Homepage: http://woudc.org
12+
Depends: elasticsearch (>=5.5.0), postgis, postgresql, python3-click, python-elasticsearch, python-geoalchemy2, python3-psycopg2, python3-requests, python3-six, python3-sqlalchemy
13+
Homepage: https://woudc.org
1414
Description: WOUDC Data Registry is a platform that manages Ozone and
1515
Ultraviolet Radiation data in support of the World Ozone and Ultraviolet
1616
Radiation Data Centre (WOUDC), one of six World Data Centres as part of the

default.env

+13-8
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
1-
export DEBUG=False
2-
export DB_TYPE=postgresql
3-
export DB_HOST=localhost
4-
export DB_PORT=5432
5-
export DB_NAME=woudc-data-registry
6-
export DB_USERNAME=postgres
7-
export DB_PASSWORD=postgres
8-
export PGPASSWORD=$DB_PASSWORD
1+
export WDR_DEBUG=False
2+
export WDR_DB_TYPE=postgresql
3+
export WDR_DB_HOST=localhost
4+
export WDR_DB_PORT=5432
5+
export WDR_DB_NAME=woudc-data-registry
6+
export WDR_DB_USERNAME=postgres
7+
export WDR_DB_PASSWORD=postgres
8+
export WDR_SEARCH_TYPE=elasticsearch
9+
export WDR_SEARCH_URL=http://localhost:9200/
10+
export WDR_WAF_BASEURL=https://woudc.org/archive/
11+
export WDR_WAF_BASEDIR=/tmp
12+
13+
export PGPASSWORD=$WDR_DB_PASSWORD

requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
click
2+
elasticsearch
23
geoalchemy2
34
psycopg2
45
requests

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def finalize_options(self):
6262
def run(self):
6363
import subprocess
6464
errno = subprocess.call([sys.executable,
65-
'woudc_data_registry/tests/run_tests.py'])
65+
'woudc_data_registry/tests/test_data_registry.py']) # noqa
6666
raise SystemExit(errno)
6767

6868

woudc_data_registry/__init__.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,9 @@
4545

4646
import click
4747

48-
from woudc_data_registry.models import model
4948
from woudc_data_registry.controller import data
49+
from woudc_data_registry.models import manage
50+
from woudc_data_registry.search import search
5051

5152
__version__ = '0.1.dev0'
5253

@@ -57,5 +58,6 @@ def cli():
5758
pass
5859

5960

60-
cli.add_command(model)
61+
cli.add_command(manage)
6162
cli.add_command(data)
63+
cli.add_command(search)

woudc_data_registry/config.py

+22-11
Original file line numberDiff line numberDiff line change
@@ -43,22 +43,33 @@
4343
#
4444
# =================================================================
4545

46+
import logging
4647
import os
4748

4849
from woudc_data_registry.util import str2bool
4950

51+
LOGGER = logging.getLogger(__name__)
5052

51-
DEBUG = str2bool(os.getenv('DEBUG', False))
5253

53-
DB_TYPE = os.getenv('DB_TYPE', 'postgresql')
54-
DB_HOST = os.getenv('DB_HOST', 'localhost')
55-
DB_PORT = int(os.getenv('DB_PORT', 5432))
56-
DB_USERNAME = os.getenv('DB_USERNAME', None)
57-
DB_PASSWORD = os.getenv('DB_PASSWORD', None)
58-
DB_NAME = os.getenv('DB_NAME', 'woudc-data-registry')
54+
WDR_DEBUG = str2bool(os.getenv('WDR_DEBUG', False))
5955

60-
if None in [DB_USERNAME, DB_PASSWORD]:
61-
raise EnvironmentError('System environment variables are not set!')
56+
WDR_DB_TYPE = os.getenv('WDR_DB_TYPE', 'postgresql')
57+
WDR_DB_HOST = os.getenv('WDR_DB_HOST', 'localhost')
58+
WDR_DB_PORT = int(os.getenv('WDR_DB_PORT', 5432))
59+
WDR_DB_USERNAME = os.getenv('WDR_DB_USERNAME', None)
60+
WDR_DB_PASSWORD = os.getenv('WDR_DB_PASSWORD', None)
61+
WDR_DB_NAME = os.getenv('WDR_DB_NAME', 'woudc-data-registry')
62+
WDR_SEARCH_TYPE = os.getenv('WDR_SEARCH_TYPE', 'elasticsearch')
63+
WDR_SEARCH_URL = os.getenv('WDR_SEARCH_URL', 'elasticsearch')
64+
WDR_WAF_BASEDIR = os.getenv('WDR_WAF_BASEDIR', None)
65+
WDR_WAF_BASEURL = os.getenv('WDR_WAF_BASEURL', 'https://woudc.org/archive')
6266

63-
DATABASE_URL = '{}://{}:{}@{}:{}/{}'.format(DB_TYPE, DB_USERNAME, DB_PASSWORD,
64-
DB_HOST, DB_PORT, DB_NAME)
67+
if None in [WDR_DB_USERNAME, WDR_DB_PASSWORD, WDR_SEARCH_TYPE,
68+
WDR_SEARCH_URL, WDR_WAF_BASEDIR, WDR_WAF_BASEURL]:
69+
msg = 'System environment variables are not set!'
70+
LOGGER.error(msg)
71+
raise EnvironmentError(msg)
72+
73+
WDR_DATABASE_URL = '{}://{}:{}@{}:{}/{}'.format(WDR_DB_TYPE, WDR_DB_USERNAME,
74+
WDR_DB_PASSWORD, WDR_DB_HOST,
75+
WDR_DB_PORT, WDR_DB_NAME)

woudc_data_registry/controller.py

+25-16
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,15 @@
5050
from woudc_data_registry.processing import Process
5151

5252

53-
def orchestrate(file_, directory, verify=False):
54-
"""core workflow"""
53+
def orchestrate(file_, directory, verify_only=False):
54+
"""
55+
core orchestation workflow
56+
57+
:param file_: file to process
58+
:param directory: directory to process (recursive)
59+
:param verify_only: whether to verify the file for correctness without
60+
processing
61+
"""
5562

5663
files_to_process = []
5764

@@ -62,18 +69,19 @@ def orchestrate(file_, directory, verify=False):
6269
for f in files:
6370
files_to_process.append(os.path.join(root, f))
6471

65-
for file_to_process in files_to_process:
66-
click.echo('Processing filename: {}'.format(file_to_process))
67-
p = Process()
68-
result = p.process_data(file_to_process, verify=verify)
69-
70-
if result: # processed
71-
if verify:
72-
click.echo('Verified but not ingested')
72+
with click.progressbar(files_to_process, label='Processing files') as run_:
73+
for file_to_process in run_:
74+
click.echo('Processing filename: {}'.format(file_to_process))
75+
p = Process()
76+
result = p.process_data(file_to_process, verify_only=verify_only)
77+
78+
if result: # processed
79+
if verify_only:
80+
click.echo('Verified but not ingested')
81+
else:
82+
click.echo('Ingested successfully')
7383
else:
74-
click.echo('Ingested successfully')
75-
else:
76-
click.echo('Not ingested')
84+
click.echo('Not ingested')
7785

7886

7987
@click.group()
@@ -90,8 +98,9 @@ def data():
9098
type=click.Path(exists=True, resolve_path=True,
9199
dir_okay=True, file_okay=False),
92100
help='Path to directory of data records')
93-
@click.option('--verify', is_flag=True)
94-
def ingest(ctx, file_, directory, verify):
101+
@click.option('--verify-only', '-vo', 'verify_only', is_flag=True,
102+
help='Verify file only')
103+
def ingest(ctx, file_, directory, verify_only):
95104
"""ingest a single data submission or directory of files"""
96105

97106
if file_ is not None and directory is not None:
@@ -102,7 +111,7 @@ def ingest(ctx, file_, directory, verify):
102111
msg = 'One of --file or --directory is required'
103112
raise click.ClickException(msg)
104113

105-
orchestrate(file_, directory, verify)
114+
orchestrate(file_, directory, verify_only)
106115

107116

108117
data.add_command(ingest)

0 commit comments

Comments
 (0)