Merge pull request #2910 from alphagov/ltr-removal

Remove Learning to Rank code
alphagov · May 14, 2024 · ee4e460 · ee4e460
2 parents ae3f073 + 8e142a9
commit ee4e460
Show file tree

Hide file tree

Showing 60 changed files with 40 additions and 7,468 deletions.
diff --git a/.github/workflows/deploy-ltr.yml b/.github/workflows/deploy-ltr.yml
diff --git a/Gemfile b/Gemfile
@@ -2,8 +2,6 @@ source "https://rubygems.org"
 
 gem "activesupport"
 gem "aws-sdk-s3"
-gem "aws-sdk-sagemaker"
-gem "aws-sdk-sagemakerruntime"
 gem "bootsnap", require: false
 gem "elasticsearch", "~> 6" # We need a 6.x release to interface with Elasticsearch 6
 gem "gds-api-adapters"

diff --git a/Gemfile.lock b/Gemfile.lock
@@ -45,12 +45,6 @@ GEM
       aws-sdk-core (~> 3, >= 3.194.0)
       aws-sdk-kms (~> 1)
       aws-sigv4 (~> 1.8)
-    aws-sdk-sagemaker (1.240.0)
-      aws-sdk-core (~> 3, >= 3.193.0)
-      aws-sigv4 (~> 1.1)
-    aws-sdk-sagemakerruntime (1.62.0)
-      aws-sdk-core (~> 3, >= 3.193.0)
-      aws-sigv4 (~> 1.1)
     aws-sigv4 (1.8.0)
       aws-eventstream (~> 1, >= 1.0.2)
     base64 (0.2.0)
@@ -668,8 +662,6 @@ PLATFORMS
 DEPENDENCIES
   activesupport
   aws-sdk-s3
-  aws-sdk-sagemaker
-  aws-sdk-sagemakerruntime
   bootsnap
   bunny-mock
   climate_control

diff --git a/docs/arch/adr-012-learn-to-rank.md b/docs/arch/adr-012-learn-to-rank.md
@@ -0,0 +1,33 @@
+# Decision record: Decommissioning Learning to Rank
+
+**Date:** 2024-05-14
+
+The search team have decided to retire [Learning to Rank][] (LTR).
+
+## Rationale
+
+[Site search][] now uses Google's Vertex AI search instead of our ElasticSearch + Learning to Rank service. The other finders still use ElasticSearch and LTR. Site Search receives more requests than all the other finders combined.
+
+Running our own relevance tuning service on top of ElasticSearch is not something we are equipped to do at this time, particularly when it's in support of a vastly reduced demand.
+
+It's expensive to do well, both in terms of money spent on infrastructure and the time that the appropriate people would need to devote to it. Unfortunately, we just don't have that available.
+
+### Limited upside to retaining it
+
+Learning to Rank was configured primarily for Site Search and the general features of documents on GOV.UK. Other finders are often set up for small sets of specific document types. These documents have many features for which Learning to Rank has not been trained.
+
+The model is poorly suited to differentiating between different Employment Tribunal decisions, for example.
+
+### Limited impact to removing it
+
+Our implementation of Learning to Rank always had a limited "blast radius" in that if would only be able to affect the rankings of a single page of results at a time. The biggest impact it could have on a result would be to promote the 20th result to be 1st (and vice versa).
+
+This also means that there is limited downside to removing the reranking feature. All the results for each query still appear on the same page as before, but potentially in a different order.
+
+### Unaffected use cases
+
+Learning to Rank only affected queries which included keywords and were ordered by relevance. Other queries, such as those that power organisation, taxon and topical event pages are unaffected.
+
+
+[Site search]: https://www.gov.uk/search/all
+[Learning to Rank]: https://github.com/alphagov/search-api/blob/1524da75f055f144392facb460bd95ef62b67bbb/docs/arch/adr-010-learn-to-rank.md
diff --git a/lib/healthcheck/reranker_healthcheck.rb b/lib/healthcheck/reranker_healthcheck.rb
diff --git a/lib/learn_to_rank/data_pipeline.rb b/lib/learn_to_rank/data_pipeline.rb
diff --git a/lib/learn_to_rank/data_pipeline/bigquery.rb b/lib/learn_to_rank/data_pipeline/bigquery.rb
diff --git a/lib/learn_to_rank/data_pipeline/embed_features.rb b/lib/learn_to_rank/data_pipeline/embed_features.rb
diff --git a/lib/learn_to_rank/data_pipeline/judgements_to_svm.rb b/lib/learn_to_rank/data_pipeline/judgements_to_svm.rb
diff --git a/lib/learn_to_rank/data_pipeline/load_search_queries.rb b/lib/learn_to_rank/data_pipeline/load_search_queries.rb