Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle streaming data - Proof of concept #5819

Open
jeffhandley opened this issue Jan 27, 2025 · 1 comment
Open

Handle streaming data - Proof of concept #5819

jeffhandley opened this issue Jan 27, 2025 · 1 comment
Assignees
Labels
ai-ready-to-implement area-ai Microsoft.Extensions.AI libraries

Comments

@jeffhandley
Copy link
Member

jeffhandley commented Jan 27, 2025

We need a proof of concept that we could implement streaming data using existing API shapes.

Broken out as a separate issue from #5719. Per #5719 (comment):

  • The best current proposal seems to be adding another type StreamingDataContent which does not inherit from DataContent but rather inherits AIContent directly. The person instantiating this type would need to pass in a suitable factory/callback to the constructor that would return the stream. The person consuming this type would call content.TryTakeStream(out var stream), which may or may not give them a stream (depending on whether it's capable of supplying more than one reader) and if it does, hands responsibility for disposing that stream instance to the consumer.
  • I'm unclear on how this interacts with JSON serialization. Is there at least one practical case where this actually avoids buffering a large file in memory? I know it probably doesn't help with receiving any data from JSON-over-HTTP. But maybe it does help when sending JSON-over-HTTP. If there are no cases today where this actually avoids buffering in practice, then we might choose not to prioritize doing this now.
@jozkee
Copy link
Member

jozkee commented Mar 26, 2025

For OpenAI, as of today, we just map our exchange types to OpenAI-dotnet SDK ones, there's no way to avoid buffering the contents into memory from our consumer perspective.
The SDK uses System.ClientModel, which works with BinaryContent and IJsonModel which creates a Utf8JsonWriter backed by array pool which uses to serialize ChatCompletionOptions with the ChatMessages attached.

Only way I found to intercept serialization was by extending ChatCompletionOptions and overriding JsonModelWriteCore and we would have to use reflection since Messages is internal.

But even if we could intercept writing JSON, we still have the problem that the writer is backed by array pool, because is using ModelBinaryContent.

As an alternative, I was testing with OpenAI Files API, which the SDK supports, but I didn't get great results. My ideas was that when someone opted into using StreamingDataContent, we could upload the files first and then reference them by ID when chatting with the model.

Not sure if my issues with OpenAI are specific to Azure and GitHub, which are the endpoints I have access to. I filed openai/openai-dotnet#396.

I can continue investigating how to do this in the other adapters (ollama and Azure AI Inference).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai-ready-to-implement area-ai Microsoft.Extensions.AI libraries
Projects
None yet
Development

No branches or pull requests

2 participants