doc improvement

Azure-Samples · Apr 18, 2024 · f743ee7 · f743ee7
1 parent afa3385
commit f743ee7
Show file tree

Hide file tree

Showing 14 changed files with 221 additions and 9 deletions.
diff --git a/AI-GATEWAY.md b/AI-GATEWAY.md
@@ -0,0 +1,204 @@
+---
+# You need to install [VS Code Reveal extension](https://marketplace.visualstudio.com/items?itemName=evilz.vscode-reveal) and then click on 'slides' at the botton to view in presentation mode
+title: AI Gateway
+theme: black
+enableMenu: true
+parallaxBackgroundImage: ../images/back.png
+parallaxBackgroundSize: 1500px 1024px
+
+---
+
+AI Gateway {style="font-size:60px"}
+
+<img src="../images/ai-gateway.gif" alt="drawing" style="width:900px;"/>
+
+
+---
+
+AI Gateway objectives
+
+*   Aims to accelerate the experimentation of advanced AI use cases {style="font-size:20px"}
+*   Ensures control and governance over the consumption of AI services {style="font-size:20px"}
+*   Paves the road for a confident deployment of Intelligent Apps into production {style="font-size:20px"}
+
+---
+
+AI Gateway toolchain
+
+<img src="../images/toolchain.png" alt="drawing" style="width:900px;"/>
+
+--------------
+
+*   Powered by VS Code running locally or in the cloud with GitHub Codespaces {style="font-size:20px"}
+*   Jupyter Notebooks structures the step-by-step instructions {style="font-size:20px"}
+*   Python scripts define the variables and  execute OpenAI API calls directly or with SDKs {style="font-size:20px"}
+*   Bicep defines the infrastructure as code needed for the lab in a declarative way {style="font-size:20px"}
+*   Azure CLI handles authentication with Azure and  issues commands to the control plane {style="font-size:20px"}
+
+---
+
+Request forwarding
+
+Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server {style="font-size:20px"}
+
+<img src="../images/request-forwarding.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   APIM uses the managed identity (user or system assigned).  {style="font-size:20px"}
+*   APIM is authorized to consume the Azure OpenAI API through Role Based Access Controls.  {style="font-size:20px"}
+*   Zero impact on consumers using the API directly, with SDKs or orchestrators like LangChain. Just need to update the endpoint to use the APIM endpoint instead of Azure OpenAI endpoint.  {style="font-size:20px"}
+*   Keyless approach: API consumers use the APIM subscription keys, and the Azure OpenAI keys are never used  {style="font-size:20px"}
+
+---
+
+Backend circuit breaking
+
+Playground to try the built-in backend circuit breaker functionality of APIM to either an Azure OpenAI endpoint or a mock server {style="font-size:20px"}
+
+<img src="../images/backend-circuit-breaking.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   Azure OpenAI endpoint is configured as an APIM backend, promoting reusability across APIs and improved governance.   {style="font-size:20px"}
+*   Circuit breaking rules define controlled availability for the OpenAI endpoint.   {style="font-size:20px"}
+*   When the circuit breaks, APIM stops sending requests to OpenAI.   {style="font-size:20px"}
+*   Handles the status code 429  (Too Many Requests) and any other status code sent by the OpenAI service.   {style="font-size:20px"}
+*   Doesn’t need any policy configuration. The rules are just properties of the backend.   {style="font-size:20px"}
+
+---
+
+Backend pool load balancing
+
+Playground to try the built-in load balancing backend pool functionality of APIM {style="font-size:20px"}
+
+<img src="../images/backend-pool-load-balancing.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   Spread the load to multiple backends, which may have individual backend circuit breakers.  {style="font-size:20px"}
+*   Shift the load from one set of backends to another for upgrade (blue-green deployment).  {style="font-size:20px"}
+*   Currently, the backend pool supports round-robin load balancing.  {style="font-size:20px"}
+*   Doesn’t need any policy configuration. The rules are just properties of the backend.  {style="font-size:20px"}
+
+---
+
+Advanced load balancing
+
+Playground to try the advanced load balancing (based on a custom APIM policy) {style="font-size:20px"}
+
+<img src="../images/advanced-load-balancing.gif" alt="drawing" style="width:600px;"/>
+
+--------------
+
+*   Loads the load balancer configuration from a named value property.  {style="font-size:20px"}
+*   Uses backends to enable the combination with the built-in circuit breaking feature or chaining with the backend pool.  {style="font-size:20px"}
+*   The policy doesn't have to be changed to add/modify endpoints or configure the load balancer.  {style="font-size:20px"}
+*   Dynamically supports any number of OpenAI endpoints.  {style="font-size:20px"}
+*   Support advanced properties like priority or weights to give priority to Provisioned Throughput Unit (PTU).  {style="font-size:20px"}
+
+---
+
+Response streaming
+
+Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming {style="font-size:20px"}
+
+<img src="../images/response-streaming.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   The client application receives the completions in chunks as it's being generated.  {style="font-size:20px"}
+*   Might improve the user experience for intelligent apps with a ChatGPT interface.  {style="font-size:20px"}
+*   Streaming responses doesn't include the usage field to tell how many tokens were consumed.  {style="font-size:20px"}
+*   You sacrifice for now APIM built-in logging.  {style="font-size:20px"}
+*   Streaming in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate.  {style="font-size:20px"}
+
+---
+
+Vector searching
+
+Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions {style="font-size:20px"}
+
+<img src="../images/vector-searching.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   Implements the popular RAG pattern.  {style="font-size:20px"}
+*   Uses Azure AI Search as a vector store.  {style="font-size:20px"}
+*   Uses OpenAI to generate the embeddings.  {style="font-size:20px"}
+*   Supports key word search, hybrid search and semantic ranking.  {style="font-size:20px"}
+*   OpenAI completion is generated based on the user prompt and the AI search results.  {style="font-size:20px"}
+*   All the APIs from OpenAI and AI Search are served trough APIM without using keys.  {style="font-size:20px"}
+
+---
+
+Built-in logging
+
+Playground to try the built-in logging capabilities of API Management  {style="font-size:20px"}
+
+<img src="../images/built-in-logging.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   The requests are logged into Application Insights and metrics available in Azure Monitor.  {style="font-size:20px"}
+*   Doesn’t need any policy configuration.  {style="font-size:20px"}
+*   Enables tracking request/response details and token usage with the provided notebook.  {style="font-size:20px"}
+*   Metrics from the Azure OpenAI service might be correlated to provide a holistic view on service usage.  {style="font-size:20px"}
+*   The notebook can be easily customized to accommodate specific use cases.  {style="font-size:20px"}
+*   Enables the creation of Azure dashboards for a single pane of glass monitoring approach.  {style="font-size:20px"}
+
+---
+
+SLM self-hosting
+
+Playground to try the self-hosted phy-2 Small Language Model (SLM) trough the APIM self-hosted gateway with OpenAI API compatibility  {style="font-size:20px"}
+
+<img src="../images/slm-self-hosting.gif" alt="drawing" style="width:700px;"/>
+
+--------------
+
+*   The APIM self-hosted gateway is a containerized version of the default managed gateway.  {style="font-size:20px"}
+*   Useful for scenarios where we need to self-host an open-source model from platforms such as Hugging Face.  {style="font-size:20px"}
+*   In this playground we have used Phi-2 that is a SLM suited to try on a laptop.  {style="font-size:20px"}
+*   Both APIM self-hosted gateway and the phy-2 could run on docker containers or in a Kubernetes cluster.  {style="font-size:20px"}
+
+---
+
+Summary
+
+
+*   The AI Gateway concept provides a range of labs that enables the experimentation of AI Services supported by an API management strategy.  {style="font-size:20px"}
+*   The experimentation will feed the design architecture and the landing zone that will go into production.  {style="font-size:20px"}
+*   The labs are based on Jupyter Notebooks to enable clear and documented instructions, Python scripts, Bicep IaC and APIM policies.  {style="font-size:20px"}
+*   There is a backlog of experiments that we plan to implement to take this work further and enable more advanced use cases. Stay tuned 🙂  {style="font-size:20px"}
+
+---
+
+<!-- .slide: data-auto-animate data-auto-animate-easing="cubic-bezier(0.770, 0.000, 0.175, 1.000)" -->
+
+### Whant to know more?
+
+[aka.ms/ai-gateway](https://aka.ms/ai-gateway)
+
+<p><br/></p>
+<div class="r-hstack justify-center">
+<div data-id="box1" style="background: #F35325; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
+<div data-id="box2" style="background: #81BC06; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
+<div data-id="box3" style="background: #05A6F0; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
+<div data-id="box4" style="background: #FFBA08; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
+</div>
+
+---
+
+<!-- .slide: data-auto-animate data-auto-animate-easing="cubic-bezier(0.770, 0.000, 0.175, 1.000)" -->
+
+<div class="r-hstack justify-center">
+<div data-id="box1" data-auto-animate-delay="0" style="background: #F35325; width: 100px; height: 100px; margin: 10px;"></div>
+<div data-id="box2" data-auto-animate-delay="0.1" style="background: #81BC06; width: 100px; height: 100px; margin: 10px;"></div>
+</div>
+<div class="r-hstack justify-center">
+<div data-id="box3" data-auto-animate-delay="0" style="background: #05A6F0; width: 100px; height: 100px; margin: 10px;"></div>
+<div data-id="box4" data-auto-animate-delay="0.1" style="background: #FFBA08; width: 100px; height: 100px; margin: 10px;"></div>
+</div>
+
+## Thank You {style="margin-top: 20px;"}
diff --git a/AI-GATEWAY.pptx b/AI-GATEWAY.pptx
diff --git a/README.md b/README.md
@@ -17,14 +17,14 @@ Acknowledging the rising dominance of Python, particularly in the realm of AI, a
 
 |  |  | |
 | ---- | ----- | ----------- |
-| [Request forwarding](labs/request-forwarding/request-forwarding.ipynb) | [![flow](images/request-forwarding.gif)](labs/request-forwarding/request-forwarding.ipynb) | Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system [managed identity](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-managed-service-identity) to authenticate into the Azure OpenAI service. |
-| [Backend circuit breaking](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb)     | [![flow](images/backend-circuit-breaking.gif)](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | Playground to try the built-in [backend circuit breaker functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either an Azure OpenAI endpoints or a mock server. |
-| [Backend pool load balancing](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | [![flow](images/backend-pool-load-balancing.gif)](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | Playground to try the built-in load balancing [backend pool functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers. |
-| [Advanced load balancing](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | [![flow](images/advanced-load-balancing.gif)](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | Playground to try the advanced load balancing (based on a custom [APIM policy](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers. |
-| [Response streaming](labs/response-streaming/response-streaming.ipynb) | [![flow](images/response-streaming.gif)](labs/response-streaming/response-streaming.ipynb) | Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events#guidelines-for-sse). |
-| [Vector searching](labs/vector-searching/vector-searching.ipynb) | [![flow](images/vector-searching.gif)](labs/vector-searching/vector-searching.ipynb) | Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. |
-| [Built-in logging](labs/built-in-logging/built-in-logging.ipynb) | [![flow](images/built-in-logging.gif)](labs/built-in-logging/built-in-logging.ipynb) | Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook.  |
-| [SLM self-hosting](labs/slm-self-hosting/slm-self-hosting.ipynb) | [![flow](images/slm-self-hosting.gif)](labs/slm-self-hosting/slm-self-hosting.ipynb) | Playground to try the self-hosted [phy-2 Small Language Model (SLM)](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/en-us/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility.  |
+| [Request forwarding](labs/request-forwarding/request-forwarding.ipynb) | [![flow](images/request-forwarding-small.gif)](labs/request-forwarding/request-forwarding.ipynb) | Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system [managed identity](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-managed-service-identity) to authenticate into the Azure OpenAI service. |
+| [Backend circuit breaking](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb)     | [![flow](images/backend-circuit-breaking-small.gif)](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | Playground to try the built-in [backend circuit breaker functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either an Azure OpenAI endpoints or a mock server. |
+| [Backend pool load balancing](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | [![flow](images/backend-pool-load-balancing-small.gif)](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | Playground to try the built-in load balancing [backend pool functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers. |
+| [Advanced load balancing](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | [![flow](images/advanced-load-balancing-small.gif)](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | Playground to try the advanced load balancing (based on a custom [APIM policy](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers. |
+| [Response streaming](labs/response-streaming/response-streaming.ipynb) | [![flow](images/response-streaming-small.gif)](labs/response-streaming/response-streaming.ipynb) | Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events#guidelines-for-sse). |
+| [Vector searching](labs/vector-searching/vector-searching.ipynb) | [![flow](images/vector-searching-small.gif)](labs/vector-searching/vector-searching.ipynb) | Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. |
+| [Built-in logging](labs/built-in-logging/built-in-logging.ipynb) | [![flow](images/built-in-logging-small.gif)](labs/built-in-logging/built-in-logging.ipynb) | Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook.  |
+| [SLM self-hosting](labs/slm-self-hosting/slm-self-hosting.ipynb) | [![flow](images/slm-self-hosting-small.gif)](labs/slm-self-hosting/slm-self-hosting.ipynb) | Playground to try the self-hosted [phy-2 Small Language Model (SLM)](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/en-us/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility.  |
 
 ### Backlog of experiments
 * Developer tooling
@@ -34,6 +34,7 @@ Acknowledging the rising dominance of Python, particularly in the realm of AI, a
 * Token rate limiting
 * Cost tracking
 * Content filtering
+* PII handling
 * Prompt storing
 * Function calling
 * Prompt guarding
@@ -80,7 +81,14 @@ The [app.py](app.py) can be customized to tailor the Mock server to specific use
 
 * [Run locally or deploy to Azure](mock-server/mock-server.ipynb)
 
-## 🥇 Resources
+## 🎒 Presenting the AI Gateway concept
+> [!TIP]
+> Install the [VS Code Reveal extension](https://marketplace.visualstudio.com/items?itemName=evilz.vscode-reveal), open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code.
+
+> [!TIP]
+> Or just open the [AI-GATEWAY.pptx](AI-GATEWAY.pptx) for a plain old PowerPoint experience.
+
+## 🥇 Other resources
 
 Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.
 

diff --git a/images/advanced-load-balancing-small.gif b/images/advanced-load-balancing-small.gif
diff --git a/images/back.png b/images/back.png
diff --git a/images/backend-circuit-breaking-small.gif b/images/backend-circuit-breaking-small.gif
diff --git a/images/backend-pool-load-balancing-small.gif b/images/backend-pool-load-balancing-small.gif
diff --git a/images/built-in-logging-small.gif b/images/built-in-logging-small.gif
diff --git a/images/developer-tooling-small.gif b/images/developer-tooling-small.gif
diff --git a/images/request-forwarding-small.gif b/images/request-forwarding-small.gif
diff --git a/images/response-streaming-small.gif b/images/response-streaming-small.gif
diff --git a/images/slm-self-hosting-small.gif b/images/slm-self-hosting-small.gif
diff --git a/images/toolchain.png b/images/toolchain.png
diff --git a/images/vector-searching-small.gif b/images/vector-searching-small.gif