diff --git a/AI-GATEWAY.md b/AI-GATEWAY.md new file mode 100644 index 0000000..a6c90df --- /dev/null +++ b/AI-GATEWAY.md @@ -0,0 +1,204 @@ +--- +# You need to install [VS Code Reveal extension](https://marketplace.visualstudio.com/items?itemName=evilz.vscode-reveal) and then click on 'slides' at the botton to view in presentation mode +title: AI Gateway +theme: black +enableMenu: true +parallaxBackgroundImage: ../images/back.png +parallaxBackgroundSize: 1500px 1024px + +--- + +AI Gateway {style="font-size:60px"} + +drawing + + +--- + +AI Gateway objectives + +* Aims to accelerate the experimentation of advanced AI use cases {style="font-size:20px"} +* Ensures control and governance over the consumption of AI services {style="font-size:20px"} +* Paves the road for a confident deployment of Intelligent Apps into production {style="font-size:20px"} + +--- + +AI Gateway toolchain + +drawing + +-------------- + +* Powered by VS Code running locally or in the cloud with GitHub Codespaces {style="font-size:20px"} +* Jupyter Notebooks structures the step-by-step instructions {style="font-size:20px"} +* Python scripts define the variables and execute OpenAI API calls directly or with SDKs {style="font-size:20px"} +* Bicep defines the infrastructure as code needed for the lab in a declarative way {style="font-size:20px"} +* Azure CLI handles authentication with Azure and issues commands to the control plane {style="font-size:20px"} + +--- + +Request forwarding + +Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server {style="font-size:20px"} + +drawing + +-------------- + +* APIM uses the managed identity (user or system assigned). {style="font-size:20px"} +* APIM is authorized to consume the Azure OpenAI API through Role Based Access Controls. {style="font-size:20px"} +* Zero impact on consumers using the API directly, with SDKs or orchestrators like LangChain. Just need to update the endpoint to use the APIM endpoint instead of Azure OpenAI endpoint. {style="font-size:20px"} +* Keyless approach: API consumers use the APIM subscription keys, and the Azure OpenAI keys are never used {style="font-size:20px"} + +--- + +Backend circuit breaking + +Playground to try the built-in backend circuit breaker functionality of APIM to either an Azure OpenAI endpoint or a mock server {style="font-size:20px"} + +drawing + +-------------- + +* Azure OpenAI endpoint is configured as an APIM backend, promoting reusability across APIs and improved governance. {style="font-size:20px"} +* Circuit breaking rules define controlled availability for the OpenAI endpoint. {style="font-size:20px"} +* When the circuit breaks, APIM stops sending requests to OpenAI. {style="font-size:20px"} +* Handles the status code 429 (Too Many Requests) and any other status code sent by the OpenAI service. {style="font-size:20px"} +* Doesn’t need any policy configuration. The rules are just properties of the backend. {style="font-size:20px"} + +--- + +Backend pool load balancing + +Playground to try the built-in load balancing backend pool functionality of APIM {style="font-size:20px"} + +drawing + +-------------- + +* Spread the load to multiple backends, which may have individual backend circuit breakers. {style="font-size:20px"} +* Shift the load from one set of backends to another for upgrade (blue-green deployment). {style="font-size:20px"} +* Currently, the backend pool supports round-robin load balancing. {style="font-size:20px"} +* Doesn’t need any policy configuration. The rules are just properties of the backend. {style="font-size:20px"} + +--- + +Advanced load balancing + +Playground to try the advanced load balancing (based on a custom APIM policy) {style="font-size:20px"} + +drawing + +-------------- + +* Loads the load balancer configuration from a named value property. {style="font-size:20px"} +* Uses backends to enable the combination with the built-in circuit breaking feature or chaining with the backend pool. {style="font-size:20px"} +* The policy doesn't have to be changed to add/modify endpoints or configure the load balancer. {style="font-size:20px"} +* Dynamically supports any number of OpenAI endpoints. {style="font-size:20px"} +* Support advanced properties like priority or weights to give priority to Provisioned Throughput Unit (PTU). {style="font-size:20px"} + +--- + +Response streaming + +Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming {style="font-size:20px"} + +drawing + +-------------- + +* The client application receives the completions in chunks as it's being generated. {style="font-size:20px"} +* Might improve the user experience for intelligent apps with a ChatGPT interface. {style="font-size:20px"} +* Streaming responses doesn't include the usage field to tell how many tokens were consumed. {style="font-size:20px"} +* You sacrifice for now APIM built-in logging. {style="font-size:20px"} +* Streaming in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. {style="font-size:20px"} + +--- + +Vector searching + +Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions {style="font-size:20px"} + +drawing + +-------------- + +* Implements the popular RAG pattern. {style="font-size:20px"} +* Uses Azure AI Search as a vector store. {style="font-size:20px"} +* Uses OpenAI to generate the embeddings. {style="font-size:20px"} +* Supports key word search, hybrid search and semantic ranking. {style="font-size:20px"} +* OpenAI completion is generated based on the user prompt and the AI search results. {style="font-size:20px"} +* All the APIs from OpenAI and AI Search are served trough APIM without using keys. {style="font-size:20px"} + +--- + +Built-in logging + +Playground to try the built-in logging capabilities of API Management {style="font-size:20px"} + +drawing + +-------------- + +* The requests are logged into Application Insights and metrics available in Azure Monitor. {style="font-size:20px"} +* Doesn’t need any policy configuration. {style="font-size:20px"} +* Enables tracking request/response details and token usage with the provided notebook. {style="font-size:20px"} +* Metrics from the Azure OpenAI service might be correlated to provide a holistic view on service usage. {style="font-size:20px"} +* The notebook can be easily customized to accommodate specific use cases. {style="font-size:20px"} +* Enables the creation of Azure dashboards for a single pane of glass monitoring approach. {style="font-size:20px"} + +--- + +SLM self-hosting + +Playground to try the self-hosted phy-2 Small Language Model (SLM) trough the APIM self-hosted gateway with OpenAI API compatibility {style="font-size:20px"} + +drawing + +-------------- + +* The APIM self-hosted gateway is a containerized version of the default managed gateway. {style="font-size:20px"} +* Useful for scenarios where we need to self-host an open-source model from platforms such as Hugging Face. {style="font-size:20px"} +* In this playground we have used Phi-2 that is a SLM suited to try on a laptop. {style="font-size:20px"} +* Both APIM self-hosted gateway and the phy-2 could run on docker containers or in a Kubernetes cluster. {style="font-size:20px"} + +--- + +Summary + + +* The AI Gateway concept provides a range of labs that enables the experimentation of AI Services supported by an API management strategy. {style="font-size:20px"} +* The experimentation will feed the design architecture and the landing zone that will go into production. {style="font-size:20px"} +* The labs are based on Jupyter Notebooks to enable clear and documented instructions, Python scripts, Bicep IaC and APIM policies. {style="font-size:20px"} +* There is a backlog of experiments that we plan to implement to take this work further and enable more advanced use cases. Stay tuned 🙂 {style="font-size:20px"} + +--- + + + +### Whant to know more? + +[aka.ms/ai-gateway](https://aka.ms/ai-gateway) + +


+
+
+
+
+
+
+ +--- + + + +
+
+
+
+
+
+
+
+ +## Thank You {style="margin-top: 20px;"} \ No newline at end of file diff --git a/AI-GATEWAY.pptx b/AI-GATEWAY.pptx new file mode 100644 index 0000000..699da72 Binary files /dev/null and b/AI-GATEWAY.pptx differ diff --git a/README.md b/README.md index 4758253..08a0c6a 100644 --- a/README.md +++ b/README.md @@ -17,14 +17,14 @@ Acknowledging the rising dominance of Python, particularly in the realm of AI, a | | | | | ---- | ----- | ----------- | -| [Request forwarding](labs/request-forwarding/request-forwarding.ipynb) | [![flow](images/request-forwarding.gif)](labs/request-forwarding/request-forwarding.ipynb) | Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system [managed identity](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-managed-service-identity) to authenticate into the Azure OpenAI service. | -| [Backend circuit breaking](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | [![flow](images/backend-circuit-breaking.gif)](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | Playground to try the built-in [backend circuit breaker functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either an Azure OpenAI endpoints or a mock server. | -| [Backend pool load balancing](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | [![flow](images/backend-pool-load-balancing.gif)](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | Playground to try the built-in load balancing [backend pool functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers. | -| [Advanced load balancing](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | [![flow](images/advanced-load-balancing.gif)](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | Playground to try the advanced load balancing (based on a custom [APIM policy](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers. | -| [Response streaming](labs/response-streaming/response-streaming.ipynb) | [![flow](images/response-streaming.gif)](labs/response-streaming/response-streaming.ipynb) | Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events#guidelines-for-sse). | -| [Vector searching](labs/vector-searching/vector-searching.ipynb) | [![flow](images/vector-searching.gif)](labs/vector-searching/vector-searching.ipynb) | Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. | -| [Built-in logging](labs/built-in-logging/built-in-logging.ipynb) | [![flow](images/built-in-logging.gif)](labs/built-in-logging/built-in-logging.ipynb) | Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook. | -| [SLM self-hosting](labs/slm-self-hosting/slm-self-hosting.ipynb) | [![flow](images/slm-self-hosting.gif)](labs/slm-self-hosting/slm-self-hosting.ipynb) | Playground to try the self-hosted [phy-2 Small Language Model (SLM)](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/en-us/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility. | +| [Request forwarding](labs/request-forwarding/request-forwarding.ipynb) | [![flow](images/request-forwarding-small.gif)](labs/request-forwarding/request-forwarding.ipynb) | Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system [managed identity](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-managed-service-identity) to authenticate into the Azure OpenAI service. | +| [Backend circuit breaking](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | [![flow](images/backend-circuit-breaking-small.gif)](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | Playground to try the built-in [backend circuit breaker functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either an Azure OpenAI endpoints or a mock server. | +| [Backend pool load balancing](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | [![flow](images/backend-pool-load-balancing-small.gif)](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | Playground to try the built-in load balancing [backend pool functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers. | +| [Advanced load balancing](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | [![flow](images/advanced-load-balancing-small.gif)](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | Playground to try the advanced load balancing (based on a custom [APIM policy](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers. | +| [Response streaming](labs/response-streaming/response-streaming.ipynb) | [![flow](images/response-streaming-small.gif)](labs/response-streaming/response-streaming.ipynb) | Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events#guidelines-for-sse). | +| [Vector searching](labs/vector-searching/vector-searching.ipynb) | [![flow](images/vector-searching-small.gif)](labs/vector-searching/vector-searching.ipynb) | Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. | +| [Built-in logging](labs/built-in-logging/built-in-logging.ipynb) | [![flow](images/built-in-logging-small.gif)](labs/built-in-logging/built-in-logging.ipynb) | Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook. | +| [SLM self-hosting](labs/slm-self-hosting/slm-self-hosting.ipynb) | [![flow](images/slm-self-hosting-small.gif)](labs/slm-self-hosting/slm-self-hosting.ipynb) | Playground to try the self-hosted [phy-2 Small Language Model (SLM)](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/en-us/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility. | ### Backlog of experiments * Developer tooling @@ -34,6 +34,7 @@ Acknowledging the rising dominance of Python, particularly in the realm of AI, a * Token rate limiting * Cost tracking * Content filtering +* PII handling * Prompt storing * Function calling * Prompt guarding @@ -80,7 +81,14 @@ The [app.py](app.py) can be customized to tailor the Mock server to specific use * [Run locally or deploy to Azure](mock-server/mock-server.ipynb) -## 🥇 Resources +## 🎒 Presenting the AI Gateway concept +> [!TIP] +> Install the [VS Code Reveal extension](https://marketplace.visualstudio.com/items?itemName=evilz.vscode-reveal), open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code. + +> [!TIP] +> Or just open the [AI-GATEWAY.pptx](AI-GATEWAY.pptx) for a plain old PowerPoint experience. + +## 🥇 Other resources Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures. diff --git a/images/advanced-load-balancing-small.gif b/images/advanced-load-balancing-small.gif new file mode 100644 index 0000000..e1fb3e2 Binary files /dev/null and b/images/advanced-load-balancing-small.gif differ diff --git a/images/back.png b/images/back.png new file mode 100644 index 0000000..e46b3d3 Binary files /dev/null and b/images/back.png differ diff --git a/images/backend-circuit-breaking-small.gif b/images/backend-circuit-breaking-small.gif new file mode 100644 index 0000000..1f6309e Binary files /dev/null and b/images/backend-circuit-breaking-small.gif differ diff --git a/images/backend-pool-load-balancing-small.gif b/images/backend-pool-load-balancing-small.gif new file mode 100644 index 0000000..4b84b42 Binary files /dev/null and b/images/backend-pool-load-balancing-small.gif differ diff --git a/images/built-in-logging-small.gif b/images/built-in-logging-small.gif new file mode 100644 index 0000000..1662e36 Binary files /dev/null and b/images/built-in-logging-small.gif differ diff --git a/images/developer-tooling-small.gif b/images/developer-tooling-small.gif new file mode 100644 index 0000000..4d437c0 Binary files /dev/null and b/images/developer-tooling-small.gif differ diff --git a/images/request-forwarding-small.gif b/images/request-forwarding-small.gif new file mode 100644 index 0000000..e7843f5 Binary files /dev/null and b/images/request-forwarding-small.gif differ diff --git a/images/response-streaming-small.gif b/images/response-streaming-small.gif new file mode 100644 index 0000000..ec13315 Binary files /dev/null and b/images/response-streaming-small.gif differ diff --git a/images/slm-self-hosting-small.gif b/images/slm-self-hosting-small.gif new file mode 100644 index 0000000..9b27930 Binary files /dev/null and b/images/slm-self-hosting-small.gif differ diff --git a/images/toolchain.png b/images/toolchain.png new file mode 100644 index 0000000..68ef7ee Binary files /dev/null and b/images/toolchain.png differ diff --git a/images/vector-searching-small.gif b/images/vector-searching-small.gif new file mode 100644 index 0000000..028dd54 Binary files /dev/null and b/images/vector-searching-small.gif differ