Skip to content

Commit

Permalink
update airflow documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
OriHoch committed Feb 11, 2024
1 parent 6f220fd commit 0a93b9d
Show file tree
Hide file tree
Showing 3 changed files with 178 additions and 76 deletions.
34 changes: 22 additions & 12 deletions airflow/README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,29 @@
# Knesset Data Pipelines Airflow

This is the Airflow implementation of the Knesset Data Pipelines project. It is a work in progress, during which we
will be migrating the existing pipelines to Airflow.
This is the Airflow implementation of the Knesset Data Pipelines project.

The Airflow project is defined under `airflow` subdirectory, all the following commands are assumed to run from there.
The Airflow project is defined under `airflow` subdirectory of knesset-data-pipelines, all the following commands are
assumed to run from this subdirectory.

The airflow pipelines themselves can all run locally using the knesset-data-pipelines CLI, so there is no need to
install Airflow unless you want to check some Airflow specific detail.

## Local Development

Prerequisites:

* System dependencies: https://airflow.apache.org/docs/apache-airflow/stable/installation.html#system-dependencies
* Python 3.8
* Docker Compose

Create virtualenv and install dependencies

```
python3.8 -m venv venv &&\
. venv/bin/activate &&\
pip install --upgrade pip setuptools wheel &&\
bin/pip_install_airflow.sh &&\
pip install -e .
```

Authenticate with gcloud:

```
gcloud auth application-default login
```

Start a Database:

```
Expand All @@ -40,7 +36,21 @@ Run commands from the CLI:
knesset-data-pipelines --help
```

Optionally, to use Airflow locally:
Depending on the specific command, you will probably need to run dependant pipelines or download some packages or
data to the database.

## Local Airflow Development

Use the following steps only if you need to check some Airflow specific functionality, most of the times it won't
be necessary.

Install the Airflow system dependencies: https://airflow.apache.org/docs/apache-airflow/stable/installation.html#system-dependencies

Install the project Airflow dependencies:

```
bin/pip_install_airflow.sh
```

Create a `.env` file with the following contents:

Expand Down
10 changes: 5 additions & 5 deletions committees/knesset.source-spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -238,11 +238,11 @@ kns_documentcommitteesession:
resource: kns_documentcommitteesession_dataservice
- run: knesset.rename_resource
parameters: {src: kns_documentcommitteesession_dataservice, dst: kns_documentcommitteesession}
# - run: filter
# cached: true
# parameters:
# in:
# - CommitteeSessionID: 2072573
- run: filter
cached: true
parameters:
in:
- CommitteeSessionID: 2072573
- run: download_document_committee_session
parameters:
out-path: ../data/committees/download_document_committee_session
Expand Down
Loading

0 comments on commit 0a93b9d

Please sign in to comment.