Limit the tokens ! #601

lumpidu · 2025-01-18T13:27:51Z

lumpidu
Jan 18, 2025

I am spending like a hundred dollars per month on Claude Sonnet-3.5, because I cannot limit the token counts in a conversation.

Mostly, I am iterating inside the same conversation when coding. Start with some code at the beginning, which is further refined inside the conversation until I am happy, etc.

The problem is that the whole conversation is always repeatedly sent to the LLM without any influence on the number of max. tokens or max. conversation iterations.

The result is sth. like that:

One conversation: > 14$. Mostly because of input token cost.

Don't get me wrong: I think, these conversations are worth $14 😄 ... but they could be way cheaper if one could set e.g. a max. conversation history limit.

Often it doesn't make sense to send the whole conversation from the beginning, because the models don't remember all the details. Often I have a conversation turn to repost my current state of the code and ask it to forget the old code state anyway. So I think it would be nice, if there is a way to limit the conversation length - either at the token or conversation turn level.

lumpidu · 2025-01-18T13:32:00Z

lumpidu
Jan 18, 2025
Author

What would be cool and partly solve this issue as well, if one could scroll through the whole conversation and selectively delete or edit a conversation turn without deleting the newer conversation flow. Because then instead of just adding the code always to the bottom of a conversation, one could inject it at the right position.

0 replies

krschacht · 2025-01-18T16:06:38Z

krschacht
Jan 18, 2025
Maintainer

Hmm. A few thoughts:

We don’t send the conversation all the way back to the beginning, we only send it back to however far the context window for this model is. I think, at least. I’d have to 100% confirm that in the code but I believe that’s how it works. So we should never be sending things which are being ignored by the LLM.

However, to your point on editing, you can scroll back and edit any point earlier in the conversation. I do that a lot for code where I pasted one version of the code but now I have an updated version that I want it to analyze instead. I just hover over an earlier conversation, click edit, and change the earlier message. But when you do that it branches the conversation so everything below that stays on the previous branch. We could have some power-user feature where you could edit a previous message without creating a branch (maybe you’d hold down Alt while clicking the edit icon or something like that).

Another interesting consideration is that both of the major models have added prompt caching so now, subsequent hits to the API where the previous context has not changed do not get reprocessed. It looks like OpenAI enables this automatically:
https://platform.openai.com/docs/guides/prompt-caching

But it looks like there is a flag we need to enable on Anthropic. I don’t think we’ve done that:
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

I see a 5-10 min cache window on this.

I wonder if our cost calculation code is likely already messed up because of this — we’re probably over estimating cost for OpenAI since cache is being used. But maybe we are not since surely the API response which tells us cost takes this into account.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the tokens ! #601

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Limit the tokens ! #601

lumpidu Jan 18, 2025

Replies: 2 comments

lumpidu Jan 18, 2025 Author

krschacht Jan 18, 2025 Maintainer

lumpidu
Jan 18, 2025

lumpidu
Jan 18, 2025
Author

krschacht
Jan 18, 2025
Maintainer