Skip to content

Commit

Permalink
Update docs following removal of LTR
Browse files Browse the repository at this point in the history
We still use nDCG as an offline relevancy metric, we just no longer measure it after reranking.
  • Loading branch information
sihugh committed May 15, 2024
1 parent ee4e460 commit e751218
Show file tree
Hide file tree
Showing 6 changed files with 3 additions and 197 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ bundle exec rake
- [Schemas](docs/schemas.md): how to work with schemas and the document types
- [Popularity information](docs/popularity.md): Search API uses Google Analytics data to improve search results.
- [Publishing document finders](docs/publishing-finders.md): Information about publishing finders using rake tasks
- [Learning to rank](docs/learning-to-rank.md): Guidance on how to run the ranking model locally

## Licence

Expand Down
11 changes: 0 additions & 11 deletions docs/how-search-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,6 @@ stack don't need to know how to construct Elasticsearch queries.
See the [relevancy documentation](relevancy.md) to learn more about how
Search API determines how relevant a document is to a query.

### Reranking

Once Search API has retrieved a selection of relevant documents from
Elasticsearch, the results are re-ranked by a machine learning model.

This process ensures that we show the most relevant documents at the top
of the search results.

See the [learning to rank documentation](learning-to-rank.md) to learn
more about the reranking model.

## Evaluating search quality

To ensure Search API returns good quality results, we use a combination of
Expand Down
164 changes: 0 additions & 164 deletions docs/learning-to-rank.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/new-indexing-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,6 @@ Example PRs:
- [Prepare for moving to rummager](https://github.com/alphagov/calendars/pull/160/files)
- [Ensure we pass the description text to publishing API](https://github.com/alphagov/calendars/pull/162/files)

## Add the format to the list in `lib/learn_to_rank/format_enums.rb`

We take format into account in our machine learning, which means we
need a mapping from formats to unique numbers.

## Update the presenter to handle the new format
You'll need to update the elasticsearch presenter in Search API so that it handles any fields which are not yet used by other formats in the govuk index.

Expand Down
13 changes: 2 additions & 11 deletions docs/relevancy.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,6 @@ a `combined_score` on every document.
The `combined_score` is used for ranking results and represents how
relevant we think a result is to your query.

## What impacts relevancy?

Once Search API has [retrieved](#what-impacts-document-retrieval) the
top scoring documents from the search indexes, it ranks the results
in order of relevance using a pre-trained model.

See the [learning to rank](learning-to-rank.md) documentation for
more details.

## What impacts document retrieval?

Out of the box, Elasticsearch comes with a decent scoring algorithm.
Expand Down Expand Up @@ -102,13 +93,13 @@ field and its number of page views in the `vc_14` field.

This is an implementation of [this curve](https://solr.apache.org/guide/7_7/function-queries.html#recip-function),
and is applied to documents of the "announcement" type in the [booster.rb][]
file. It serves to increase the score of new documents and decrease
file. It serves to increase the score of new documents and decrease
the score of old documents.

Only documents of `search_format_types` 'announcement' are affected by
recency boosting.

The curve was chosen so that it only applies the boost temporarily (2
The curve was chosen so that it only applies the boost temporarily (2
months moderate decay then a rapid decay after that).

#### Properties
Expand Down
6 changes: 1 addition & 5 deletions docs/search-quality-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,9 @@ click on something that isn't what they were looking for. But this
serves our needs in the absence of a more sophisticated way of
measuring user success following a search.

We also measure nDCG before and after re-ranking over time, to
tell us how search is performing against relevance judgements.

## Offline metrics

Our main offline metric is nDCG. We measure this before and after
re-ranking by our [learning to rank model](learning-to-rank.md).
Our main offline metric is nDCG.

We use Elasticsearch's [Ranking Evaluation API](ranking_evaluation_api)
to assess the quality of results retrieved from Elasticsearch prior
Expand Down

0 comments on commit e751218

Please sign in to comment.