Move ML experiments to the model repository

These now exist in https://github.com/readthedocs/ethicalads-model
readthedocs · Dec 15, 2022 · 7ceedd0 · 7ceedd0
1 parent 305f0df
commit 7ceedd0
Show file tree

Hide file tree

Showing 12 changed files with 1 addition and 1,935 deletions.
diff --git a/machine_learning_experiments/.gitignore b/machine_learning_experiments/.gitignore
diff --git a/machine_learning_experiments/README.md b/machine_learning_experiments/README.md
@@ -1,66 +1 @@
-# Machine Learning for Ads!
-
-This project uses [spaCy](https://spacy.io) to do text classification around text for ad targeting.
-
-## Quickstart
-
-This will generate our training data and then build and train the model.
-
-	# Generate training and test set from the categorized data (Yaml file)
-	python scripts/generate-training-test-sets.py -o assets/train.json -f assets/test.json assets/categorized-data.yml
-	python -m spacy project run all . --vars.train=train --vars.dev=test --vars.name=ethicalads_topics --vars.version=`date "+%Y%m%d_%H_%M_%S"`
-
-
-### Running the analyzer
-
-After installing the analyzer (it's installed in staging already),
-you can run it against an arbitrary URL to see how that page was classified.
-
-    ADSERVER_ANALYZER_BACKEND=adserver.analyzer.backends.EthicalAdsTopicsBackend ./manage.py runmodel https://example.com
-
-
-## 📋 project.yml
-
-The [`project.yml`](project.yml) defines the data assets required by the
-project, as well as the available commands and workflows. For details, see the
-[spaCy projects documentation](https://spacy.io/usage/projects).
-
-For training with a GPU, some modifications to the `project.yml` are needed.
-Specifically, set the `gpu_id` (to 0 usually) and the `config` to `gpu-efficiency.cfg`.
-
-### ⏯ Commands
-
-The following commands are defined by the project. They
-can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run).
-Commands are only re-run if their inputs have changed.
-
-| Command | Description |
-| --- | --- |
-| `preprocess` | Convert the data to spaCy's binary format |
-| `train` | Train a text classification model |
-| `evaluate` | Evaluate the model and export metrics |
-| `package` | Build the actual Python package for the model to install |
-
-### ⏭ Workflows
-
-The following workflows are defined by the project. They
-can be executed using [`spacy project run [name]`](https://spacy.io/api/cli#project-run)
-and will run the specified commands in order. Commands are only re-run if their
-inputs have changed.
-
-| Workflow | Steps |
-| --- | --- |
-| `all` | `preprocess` &rarr; `train` &rarr; `evaluate` |
-
-## 📚 Data
-
-Our data is hand-labeled URL's that are located in ``assets/categorized-data.yml``.
-This maps a specific URL to a topic,
-and then we download the data from those URL's and split them into a training & validation set with ``scripts/generate-training-test-sets.py``.
-
-## Deployment
-
-We are currently just uploading a zipfile of the Python model,
-and then installing it in our deployment scripts into a baked build image.
-
-This can be found in our closed source ``ethicalads-ops`` repo that has custom deployment code.
+Our ML model for ads has moved to a separate repo [ethicalads-model](https://github.com/readthedocs/ethicalads-model).
diff --git a/machine_learning_experiments/assets/.gitattributes b/machine_learning_experiments/assets/.gitattributes