Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jina reranker(turbo/tiny) being classified as embedding models #325

Open
2 of 4 tasks
John42506176Linux opened this issue Jul 25, 2024 · 11 comments
Open
2 of 4 tasks

Comments

@John42506176Linux
Copy link

John42506176Linux commented Jul 25, 2024

System Info

System Info:

AWS EC2 G4dn

Amazon Linux

Model:jinaai/jina-reranker-v1-tiny-en or jinaai/jina-reranker-v1-turbo-en

Hardware: Nvidia-smi

Using latest docker version

Command:

port=7997
rerank_model=jinaai/jina-reranker-v1-tiny-en
volume=$PWD/data

sudo docker run -it --gpus all
-v $volume:/app/.cache
-p $port:$port
michaelf34/infinity:latest
v2
--batch-size 256
--model-id $rerank_model
--port $port

Information

  • Docker
  • The CLI directly via pip

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Run the following command,

port=7997
rerank_model=jinaai/jina-reranker-v1-tiny-en
volume=$PWD/data

sudo docker run -it --gpus all
-v $volume:/app/.cache
-p $port:$port
michaelf34/infinity:latest
v2
--batch-size 256
--model-id $rerank_model
--port $port

Then attempt to use the /rerank endpoint with a simple body

{
"query": "test",
"documents": [
"test"
],
"return_documents": false,
"model": "jinaai/jina-reranker-v1-tiny-en"
}

and you will get the following error

"error": {
"message": "ModelNotDeployedError: model=jinaai/jina-reranker-v1-tiny-en does not support rerank. Reason: the loaded moded cannot fullyfill rerank.options are {'embed'}.",
"type": null,
"param": null,
"code": 400
}
}.

I've tested this with other inference servers like Text embedding inference and the same error occurs, however, it does not occur with the standard transformer library.

Expected behavior

Should be able to rerank with these models.

Copy link
Contributor

greptile-apps bot commented Jul 25, 2024

Resolution Steps

  1. Update create_server function in /libs/infinity_emb/infinity_emb/infinity_server.py:

    • Modify _resolve_engine to include reranker models in its capabilities check.
    def _resolve_engine(model: str) -> "AsyncEmbeddingEngine":
        try {
            engine: "AsyncEmbeddingEngine" = app.engine_array[model]  # type: ignore
        } except IndexError as ex {
            raise errors.OpenAIException(
                f"Invalid model: {ex}",
                code=status.HTTP_400_BAD_REQUEST,
            )
        }
        if engine.is_overloaded() {
            raise errors.OpenAIException(
                f"model {model} is currently overloaded",
                code=status.HTTP_429_TOO_MANY_REQUESTS,
            )
        }
        if 'rerank' not in engine.capabilities {
            raise errors.OpenAIException(
                f"ModelNotDeployedError: model=`{model}` does not support `rerank`.",
                code=status.HTTP_400_BAD_REQUEST,
            )
        }
        return engine
    }
  2. Update tests in /libs/infinity_emb/tests/end_to_end/test_torch_reranker.py:

    • Ensure reranker models are correctly identified and tested.
    @pytest.mark.anyio
    async def test_reranker(client, model_base, helpers) {
        query = "Where is the Eiffel Tower located?"
        documents = [
            "The Eiffel Tower is located in Paris, France",
            "The Eiffel Tower is located in the United States.",
            "The Eiffel Tower is located in the United Kingdom.",
        ]
        response = await client.post(
            f"{PREFIX}/rerank",
            json={"model": MODEL, "query": query, "documents": documents},
        )
        assert response.status_code == 200
        rdata = response.json()
        assert "model" in rdata
        assert "usage" in rdata
        rdata_results = rdata["results"]
    
        predictions = [
            model_base.predict({"text": query, "text_pair": doc}) for doc in documents
        ]
    
        assert len(rdata_results) == len(predictions)
        for i, pred in enumerate(predictions) {
            assert abs(rdata_results[i]["relevance_score"] - pred["score"]) < 0.01
        }
    }

References

/docs/docs/index.md
/README.md
/libs/infinity_emb/tests/end_to_end/test_optimum_embedding.py
/libs/infinity_emb/infinity_emb/fastapi_schemas/convert.py
/.github/ISSUE_TEMPLATE/new-model-addition.yml
/docs/docs/deploy.md
/docs/assets/openapi.json
/docs/docs
/libs/infinity_emb/tests/data/benchmark

/docs/benchmarks/simple_app.py
/.github/ISSUE_TEMPLATE/bug-report.yml
/libs/infinity_emb/tests/end_to_end/test_torch_reranker.py
/libs/infinity_emb/tests/conftest.py
/docs/docs/python_engine.md
/docs/benchmarks
/docs/docs/benchmarking.md
/docs
/libs/infinity_emb/tests/end_to_end/test_authentication.py
/libs/infinity_emb/tests/data
/libs/infinity_emb
/.github/workflows/release.yaml
/docs/assets
/.github/ISSUE_TEMPLATE
/libs/infinity_emb/infinity_emb/infinity_server.py

About Greptile

This response provides a starting point for your research, not a precise solution.

Help us improve! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

Ask Greptile · Edit Issue Bot Settings

@michaelfeil
Copy link
Owner

Damn , Greptile is pretty useless.

@michaelfeil
Copy link
Owner

Does e.g. something like this work? https://huggingface.co/jinaai/jina-reranker-v1-turbo-en/discussions/10 aka --revision refs/pr/10 or so ? Seems like jina messed up their config. https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1/blob/main/config.json mixedbread does it better here..

@michaelfeil
Copy link
Owner

@John42506176Linux The detail is that you need to set "architectures": ["JinaBertForSequenceClassification"], in the config.json to be recognized as reranker model. Would you be so kind and open PRs at the Jina models for this? I apprechiate your time

@John42506176Linux
Copy link
Author

Assumed it was something simple, thanks for the quick response, I am testing your first comment rn.

@John42506176Linux
Copy link
Author

K looks good. Thanks for the quick fix, I appreciate the quick response, I'll open a pr, for the tiny model soon. (Need to finish testing for the turbo model first), but thanks saved me some time :).

@John42506176Linux
Copy link
Author

@michaelfeil Tiny gives the following error when making the config.json change RuntimeError: Error(s) in loading state_dict for JinaBertForSequenceClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([1, 384]) from checkpoint, the shape in current model is torch.Size([2, 384]).
size mismatch for classifier.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([2]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

@michaelfeil
Copy link
Owner

@John42506176Linux Seems like the reranker model has 2 outputs. That is not how rerankers are trained. Rerankers usually have one and only one output class.

With all respect, I don't have time to fix Jina's questionable choice for training here. The config file is ambiguous and leaves a lot of room for how to load the model.

@John42506176Linux
Copy link
Author

No worries, you already saved me time, by helping with turbo. Thanks for the assistance.

@wirthual
Copy link
Collaborator

Hi @michaelfeil and @John42506176Linux ,

adding num_labels:1 additonally to the config seems to do the trick here.

I tested it with:

infinity_emb v2 --model-id jinaai/jina-reranker-v1-turbo-en --revision refs/pr/11

And it correclty shows up as a rerank model:

"capabilities":["rerank"]

@michaelfeil
Copy link
Owner

@wirthual Thanks so much! This is exactly the solution for this model!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants