gamechanger-data
focuses on the data engineering work of gamechanger. To see all repositories gamechanger
- Configuration of repo is reliant on being able to hit advana-data-zone's s3 bucket. If you do not have access to advana-data-zone's s3 bucket, you will need to fill in your own values in config script; like topic_models (for ML features) and configure_app (ElasticSearch, Postgres, and Neo4j)
- Once venv is set up, set
DEPLOYMENT_ENV
variable and run./paasJobs/configure_repo.sh
orpaasJobs/configure_repo.bat
ExampleDEPLOYMENT_ENV=local ./paasJobs/configure_repo.sh
orset DEPLOYMENT_ENV=local \paasJobs\configure_repo.bat
- Clone fresh
gamechanger-data
repo - Setup python3.8 venv with packages in requirements.txt.
- Create python3.8 venv, e.g.
python3 -m venv /opt/gc-venv-20210613
- Before installing packages, update pip/wheel/setuptools, e.g.
<venv>/bin/pip install --upgrade pip setuptools wheel
- Install packages from
requirements.txt
, with no additional dependencies, e.g.<venv>/bin/pip install --no-deps -r requirements.txt
- Create python3.8 venv, e.g.
- Set up symlink
/opt/gc-venv-current
to the freshly created venv, e.g.ln -s /opt/gc-venv-20210613 /opt/gc-venv-current
- Pull in other dependencies and configure repo with
env SCRIPT_ENV=<prod|dev> <repo>/paasJobs/configure_repo.sh
- Config script will let you know if everything was configured correctly and if all backends can be reached.
- (Linux Only) Follow instruction appropriate to repo to install
ocrmypdf
and its dependencies: https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-linux - (MacOS Only) Install "brew" then use it to install tesseract
brew install tesseract-lang
- Install Miniconda or Anaconda (Miniconda is much smaller)
https://docs.conda.io/en/latest/miniconda.html
- Create gamechanger python3.8 environment, like so:
conda create -n gc python=3.8
- Clone the repo and change into that dir
git clone ...; cd gamechanger
- Activate conda environment and install requirements:
‼️ reeeealy important - make sure you change into repo directoryconda activate gc
pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
(quoting around .[dev] is important)
- That's it.
- Setup Windows Subsystem for Linux (WSL) environment
https://docs.microsoft.com/en-us/windows/wsl/install-win10
- (In WSL)
- Install ocrmypdf dependencies following ubuntu instructions here: https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-linux
- Install Miniconda or Anaconda (Miniconda is much smaller)
https://docs.conda.io/en/latest/miniconda.html
- Create gamechanger python3.8 environment, like so:
conda create -n gc python=3.8
- Clone the repo and change into that dir
git clone ...; cd gamechanger-data
- Activate conda environment and install requirements:
‼️ reeeealy important - make sure you change into repo directoryconda activate gc
pip install --upgrade pip setuptools wheel
pip install -e '.[dev]'
(quoting around .[dev] is important)
- That's it, just activate that conda env if you want to use it inside the terminal.
Create venv
python -m venv [venv-name]
Activate
\[venv-name]\Scripts\activate
Update venv
python -m pip install --upgrade pip setuptools wheel
Install requirements.txt
pip install --no-deps -r dev_tools\requirements\gc-venv-current.txt
Run Configure Repo, Steps at the top of this README
To-Do:
- convert .sh scripts to .bat to support window users
docker build -t gc-data --no-cache .
docker rm -f gc-data-test || true
docker run -it --name gc-data gc-data
Configure Repo
Note: If you're using containerized env, you'll need Pro version of PyCharm and separate set of instructions - here
- Create new project by opening directory where you cloned the repository. PyCharm will tell you that it sees existing repo there, just accept that and proceed.
- With your gc conda environment all good to go, change your
"Preferences -> Project -> Python Interpreter"
to the EXISTINGgc
conda env you created. https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html - Now, change your
"Preferences -> Build, Execution, Deployment -> Console -> Python Console interpreter"
to yourgc
conda interpreter env that you added earlier. - That's it, you will now have correct env in Terminal, Python Console, and elsewhere in the IDE.
Note: if you're using containerized env, you'll need setup like this
- Open the cloned dir in new workspace and make sure to set your conda
gc
venv as the python venv https://code.visualstudio.com/docs/python/environments - That's it, when you start new integrated terminals, they'll activate the right environment and the syntax highlighting/autocompletion is going to work as it's supposed to.
My venv is broken somehow!
- Delete the old conda environment and create a new one, follow steps above to reinstall it.
See LICENSE.md (including licensing intent - INTENT.md) and CONTRIBUTING.md