Display Token usage #418

lukehinds · 2024-12-19T09:56:53Z

Can we display the amount of tokens used by any given provider, this would be useful for the new copilot free tier.

An extra would be to record the token usage per conversation. This would allow user insight into what prompts are more costly and allow optimization.

kudos @craigmcl for the idea.

lukehinds · 2025-01-06T14:39:14Z

this will require #454 to land first, so let's keep in backlog for now.

aponcedeleonch · 2025-01-24T09:56:02Z

At an initial investigation the used tokens are not listed neither in the request nor the response from the LLM.

Request

{
  "messages": [...],
  "model": "gpt-4o",
  "temperature": 0.1,
  "top_p": 1,
  "max_tokens": 4096,
  "n": 1,
  "stream": true
}

max_tokens: The maximum number of tokens that can be generated in the chat completion. Reference

Response

[
"{\"id\":\"\",\"created\":0,\"model\":\"\",\"object\":\"chat.completion.chunk\",\"choices\":[]}", 
"{\"id\":\"chatcmpl-Ao5A9Sf7Q6WB751oF5OpU7Wmwcfv4\",\"created\":1736499609,\"model\":\"gpt-4o-2024-05-13\",\"object\":\"chat.completion.chunk\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"\",\"role\":\"assistant\"}}]}", 
....
"{\"id\":\"chatcmpl-Ao5A9Sf7Q6WB751oF5OpU7Wmwcfv4\",\"created\":1736499609,\"model\":\"gpt-4o-2024-05-13\",\"object\":\"chat.completion.chunk\",\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"delta\":{\"role\":\"assistant\"}}]}"
]

There are 2 alternatives:

See if there's a way the LLM providers list in their response the tokens they have used. At a first glance it looks to be possible at least for OpeanAI
Use our own tokenizer. We could tokenize ourselves the request and response and calculate that way the number of used tokens. The big drawback with this is that the tokens we calculate with the tokenizer may not match the tokens used by the LLM. But at least it would be an approximation

aponcedeleonch · 2025-01-24T14:16:46Z

I have been playing around with the APIs. It's possible for all providers. All of them include the token usage automatically if the request is non-streaming. For streaming we need to explicitly request for it, except for Anthropic, which already includes it at the first chunk.

Anthropic

The token usage comes separated in 2 chunks. One at the beginning and another one at the end.

// First chunk
{
  "type": "message_start",
  "message": {
    "id": "msg_011itXmqtd7KHB6adpbDdwWX",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-5-sonnet-20241022",
    "content": [],
    "stop_reason": null,
    "stop_sequence": null,
    "usage": {
      "input_tokens": 10,
      "cache_creation_input_tokens": 0,
      "cache_read_input_tokens": 0,
      "output_tokens": 1
    }
  }
}

// Last chunk
{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null
  },
  "usage": {
    "output_tokens": 13
  }
}

OpenAI, Ollama, VLLM

We need to request explicitly the token usage when the request is set to streaming, which is most of the time from clients. Note the stream_options field in the following example request

curl -s -X POST "<api>/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <token>" \
    -d '{
        "model": "unsloth/Qwen2.5-Coder-32B-Instruct",
        "stream": true,
        "stream_options": {"include_usage": true},
        "messages": [{"role": "user", "content": "Hello, world"}]
    }'

Response with token usage at the last chunk. It comes after the chunk with finish_reason: "stop".

{
  "id": "chatcmpl-4933d74a8f8b4a82a855439eeab1ae3d",
  "object": "chat.completion.chunk",
  "created": 1737723773,
  "model": "unsloth/Qwen2.5-Coder-32B-Instruct",
  "choices": [],
  "usage": {
    "prompt_tokens": 32,
    "total_tokens": 42,
    "completion_tokens": 10
  }
}

aponcedeleonch · 2025-01-27T14:00:38Z

Costs

LiteLLM uses a static file for mapping a model name to the costs. Link to code, Link to file
Not sure where OpenRouter is getting their info. However we can do an API call to get the prices they use: curl "https://openrouter.ai/api/v1/models". Reference link

Related: #418 This PR does introduces the changes necessary to track the used tokens per request and then process them to return them in the API. Specific changes: - Make sure we process all the stream and record at the very end - Include the flag `"stream_options": {"include_usage": True},` so the providers respond with the tokens - Added the necessary processing for the API - Modified the initial API models to display correctly the tokens and its price

* Include the token usage for every conversation and workspace Related: #418 This PR does introduces the changes necessary to track the used tokens per request and then process them to return them in the API. Specific changes: - Make sure we process all the stream and record at the very end - Include the flag `"stream_options": {"include_usage": True},` so the providers respond with the tokens - Added the necessary processing for the API - Modified the initial API models to display correctly the tokens and its price * Moved token recording to DB * Changed token usage code to get info from file and added GHA to get file periodically * formatting changes * Move model cost to dedicated folder * Fix problems with copilot streaming

github-actions bot added the needs-triage label Dec 19, 2024

lukehinds added feature-request dashboard and removed needs-triage labels Dec 19, 2024

lukehinds assigned aponcedeleonch Jan 23, 2025

aponcedeleonch mentioned this issue Jan 27, 2025

Include the token usage for every conversation and workspace #788

Merged

aponcedeleonch closed this as completed in #788 Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display Token usage #418

Display Token usage #418

lukehinds commented Dec 19, 2024 •

edited

Loading

lukehinds commented Jan 6, 2025

aponcedeleonch commented Jan 24, 2025 •

edited

Loading

aponcedeleonch commented Jan 24, 2025 •

edited

Loading

aponcedeleonch commented Jan 27, 2025 •

edited

Loading

Display Token usage #418

Display Token usage #418

Comments

lukehinds commented Dec 19, 2024 • edited Loading

lukehinds commented Jan 6, 2025

aponcedeleonch commented Jan 24, 2025 • edited Loading

aponcedeleonch commented Jan 24, 2025 • edited Loading

Anthropic

OpenAI, Ollama, VLLM

aponcedeleonch commented Jan 27, 2025 • edited Loading

Costs

lukehinds commented Dec 19, 2024 •

edited

Loading

aponcedeleonch commented Jan 24, 2025 •

edited

Loading

aponcedeleonch commented Jan 24, 2025 •

edited

Loading

aponcedeleonch commented Jan 27, 2025 •

edited

Loading