Update docs following removal of LTR

We still use nDCG as an offline relevancy metric, we just no longer measure it after reranking.
alphagov · May 15, 2024 · e751218 · e751218
1 parent ee4e460
commit e751218
Show file tree

Hide file tree

Showing 6 changed files with 3 additions and 197 deletions.
diff --git a/README.md b/README.md
@@ -43,7 +43,6 @@ bundle exec rake
 - [Schemas](docs/schemas.md): how to work with schemas and the document types
 - [Popularity information](docs/popularity.md): Search API uses Google Analytics data to improve search results.
 - [Publishing document finders](docs/publishing-finders.md): Information about publishing finders using rake tasks
-- [Learning to rank](docs/learning-to-rank.md): Guidance on how to run the ranking model locally
 
 ## Licence
 

diff --git a/docs/how-search-works.md b/docs/how-search-works.md
@@ -19,17 +19,6 @@ stack don't need to know how to construct Elasticsearch queries.
 See the [relevancy documentation](relevancy.md) to learn more about how
 Search API determines how relevant a document is to a query.
 
-### Reranking
-
-Once Search API has retrieved a selection of relevant documents from
-Elasticsearch, the results are re-ranked by a machine learning model.
-
-This process ensures that we show the most relevant documents at the top
-of the search results.
-
-See the [learning to rank documentation](learning-to-rank.md) to learn
-more about the reranking model.
-
 ## Evaluating search quality
 
 To ensure Search API returns good quality results, we use a combination of

diff --git a/docs/learning-to-rank.md b/docs/learning-to-rank.md
diff --git a/docs/new-indexing-process.md b/docs/new-indexing-process.md
@@ -23,11 +23,6 @@ Example PRs:
 - [Prepare for moving to rummager](https://github.com/alphagov/calendars/pull/160/files)
 - [Ensure we pass the description text to publishing API](https://github.com/alphagov/calendars/pull/162/files)
 
-## Add the format to the list in `lib/learn_to_rank/format_enums.rb`
-
-We take format into account in our machine learning, which means we
-need a mapping from formats to unique numbers.
-
 ## Update the presenter to handle the new format
 You'll need to update the elasticsearch presenter in Search API so that it handles any fields which are not yet used by other formats in the govuk index.
 

diff --git a/docs/relevancy.md b/docs/relevancy.md
@@ -36,15 +36,6 @@ a `combined_score` on every document.
 The `combined_score` is used for ranking results and represents how
 relevant we think a result is to your query.
 
-## What impacts relevancy?
-
-Once Search API has [retrieved](#what-impacts-document-retrieval) the
-top scoring documents from the search indexes, it ranks the results
-in order of relevance using a pre-trained model.
-
-See the [learning to rank](learning-to-rank.md) documentation for
-more details.
-
 ## What impacts document retrieval?
 
 Out of the box, Elasticsearch comes with a decent scoring algorithm.
@@ -102,13 +93,13 @@ field and its number of page views in the `vc_14` field.
 
 This is an implementation of [this curve](https://solr.apache.org/guide/7_7/function-queries.html#recip-function),
 and is applied to documents of the "announcement" type in the [booster.rb][]
-file.  It serves to increase the score of new documents and decrease 
+file.  It serves to increase the score of new documents and decrease
 the score of old documents.
 
 Only documents of `search_format_types` 'announcement' are affected by
 recency boosting.
 
-The curve was chosen so that it only applies the boost temporarily (2 
+The curve was chosen so that it only applies the boost temporarily (2
 months moderate decay then a rapid decay after that).
 
 #### Properties

diff --git a/docs/search-quality-metrics.md b/docs/search-quality-metrics.md
@@ -15,13 +15,9 @@ click on something that isn't what they were looking for. But this
 serves our needs in the absence of a more sophisticated way of
 measuring user success following a search.
 
-We also measure nDCG before and after re-ranking over time, to
-tell us how search is performing against relevance judgements.
-
 ## Offline metrics
 
-Our main offline metric is nDCG. We measure this before and after
-re-ranking by our [learning to rank model](learning-to-rank.md).
+Our main offline metric is nDCG.
 
 We use Elasticsearch's [Ranking Evaluation API](ranking_evaluation_api)
 to assess the quality of results retrieved from Elasticsearch prior