Skip to content

Commit

Permalink
doc improvement
Browse files Browse the repository at this point in the history
  • Loading branch information
vieiraae committed Apr 18, 2024
1 parent afa3385 commit f743ee7
Show file tree
Hide file tree
Showing 14 changed files with 221 additions and 9 deletions.
204 changes: 204 additions & 0 deletions AI-GATEWAY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
---
# You need to install [VS Code Reveal extension](https://marketplace.visualstudio.com/items?itemName=evilz.vscode-reveal) and then click on 'slides' at the botton to view in presentation mode
title: AI Gateway
theme: black
enableMenu: true
parallaxBackgroundImage: ../images/back.png
parallaxBackgroundSize: 1500px 1024px

---

AI Gateway {style="font-size:60px"}

<img src="../images/ai-gateway.gif" alt="drawing" style="width:900px;"/>


---

AI Gateway objectives

* Aims to accelerate the experimentation of advanced AI use cases {style="font-size:20px"}
* Ensures control and governance over the consumption of AI services {style="font-size:20px"}
* Paves the road for a confident deployment of Intelligent Apps into production {style="font-size:20px"}

---

AI Gateway toolchain

<img src="../images/toolchain.png" alt="drawing" style="width:900px;"/>

--------------

* Powered by VS Code running locally or in the cloud with GitHub Codespaces {style="font-size:20px"}
* Jupyter Notebooks structures the step-by-step instructions {style="font-size:20px"}
* Python scripts define the variables and execute OpenAI API calls directly or with SDKs {style="font-size:20px"}
* Bicep defines the infrastructure as code needed for the lab in a declarative way {style="font-size:20px"}
* Azure CLI handles authentication with Azure and issues commands to the control plane {style="font-size:20px"}

---

Request forwarding

Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server {style="font-size:20px"}

<img src="../images/request-forwarding.gif" alt="drawing" style="width:700px;"/>

--------------

* APIM uses the managed identity (user or system assigned). {style="font-size:20px"}
* APIM is authorized to consume the Azure OpenAI API through Role Based Access Controls. {style="font-size:20px"}
* Zero impact on consumers using the API directly, with SDKs or orchestrators like LangChain. Just need to update the endpoint to use the APIM endpoint instead of Azure OpenAI endpoint. {style="font-size:20px"}
* Keyless approach: API consumers use the APIM subscription keys, and the Azure OpenAI keys are never used {style="font-size:20px"}

---

Backend circuit breaking

Playground to try the built-in backend circuit breaker functionality of APIM to either an Azure OpenAI endpoint or a mock server {style="font-size:20px"}

<img src="../images/backend-circuit-breaking.gif" alt="drawing" style="width:700px;"/>

--------------

* Azure OpenAI endpoint is configured as an APIM backend, promoting reusability across APIs and improved governance. {style="font-size:20px"}
* Circuit breaking rules define controlled availability for the OpenAI endpoint. {style="font-size:20px"}
* When the circuit breaks, APIM stops sending requests to OpenAI. {style="font-size:20px"}
* Handles the status code 429 (Too Many Requests) and any other status code sent by the OpenAI service. {style="font-size:20px"}
* Doesn’t need any policy configuration. The rules are just properties of the backend. {style="font-size:20px"}

---

Backend pool load balancing

Playground to try the built-in load balancing backend pool functionality of APIM {style="font-size:20px"}

<img src="../images/backend-pool-load-balancing.gif" alt="drawing" style="width:700px;"/>

--------------

* Spread the load to multiple backends, which may have individual backend circuit breakers. {style="font-size:20px"}
* Shift the load from one set of backends to another for upgrade (blue-green deployment). {style="font-size:20px"}
* Currently, the backend pool supports round-robin load balancing. {style="font-size:20px"}
* Doesn’t need any policy configuration. The rules are just properties of the backend. {style="font-size:20px"}

---

Advanced load balancing

Playground to try the advanced load balancing (based on a custom APIM policy) {style="font-size:20px"}

<img src="../images/advanced-load-balancing.gif" alt="drawing" style="width:600px;"/>

--------------

* Loads the load balancer configuration from a named value property. {style="font-size:20px"}
* Uses backends to enable the combination with the built-in circuit breaking feature or chaining with the backend pool. {style="font-size:20px"}
* The policy doesn't have to be changed to add/modify endpoints or configure the load balancer. {style="font-size:20px"}
* Dynamically supports any number of OpenAI endpoints. {style="font-size:20px"}
* Support advanced properties like priority or weights to give priority to Provisioned Throughput Unit (PTU). {style="font-size:20px"}

---

Response streaming

Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with streaming {style="font-size:20px"}

<img src="../images/response-streaming.gif" alt="drawing" style="width:700px;"/>

--------------

* The client application receives the completions in chunks as it's being generated. {style="font-size:20px"}
* Might improve the user experience for intelligent apps with a ChatGPT interface. {style="font-size:20px"}
* Streaming responses doesn't include the usage field to tell how many tokens were consumed. {style="font-size:20px"}
* You sacrifice for now APIM built-in logging. {style="font-size:20px"}
* Streaming in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. {style="font-size:20px"}

---

Vector searching

Playground to try the Retrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions {style="font-size:20px"}

<img src="../images/vector-searching.gif" alt="drawing" style="width:700px;"/>

--------------

* Implements the popular RAG pattern. {style="font-size:20px"}
* Uses Azure AI Search as a vector store. {style="font-size:20px"}
* Uses OpenAI to generate the embeddings. {style="font-size:20px"}
* Supports key word search, hybrid search and semantic ranking. {style="font-size:20px"}
* OpenAI completion is generated based on the user prompt and the AI search results. {style="font-size:20px"}
* All the APIs from OpenAI and AI Search are served trough APIM without using keys. {style="font-size:20px"}

---

Built-in logging

Playground to try the built-in logging capabilities of API Management {style="font-size:20px"}

<img src="../images/built-in-logging.gif" alt="drawing" style="width:700px;"/>

--------------

* The requests are logged into Application Insights and metrics available in Azure Monitor. {style="font-size:20px"}
* Doesn’t need any policy configuration. {style="font-size:20px"}
* Enables tracking request/response details and token usage with the provided notebook. {style="font-size:20px"}
* Metrics from the Azure OpenAI service might be correlated to provide a holistic view on service usage. {style="font-size:20px"}
* The notebook can be easily customized to accommodate specific use cases. {style="font-size:20px"}
* Enables the creation of Azure dashboards for a single pane of glass monitoring approach. {style="font-size:20px"}

---

SLM self-hosting

Playground to try the self-hosted phy-2 Small Language Model (SLM) trough the APIM self-hosted gateway with OpenAI API compatibility {style="font-size:20px"}

<img src="../images/slm-self-hosting.gif" alt="drawing" style="width:700px;"/>

--------------

* The APIM self-hosted gateway is a containerized version of the default managed gateway. {style="font-size:20px"}
* Useful for scenarios where we need to self-host an open-source model from platforms such as Hugging Face. {style="font-size:20px"}
* In this playground we have used Phi-2 that is a SLM suited to try on a laptop. {style="font-size:20px"}
* Both APIM self-hosted gateway and the phy-2 could run on docker containers or in a Kubernetes cluster. {style="font-size:20px"}

---

Summary


* The AI Gateway concept provides a range of labs that enables the experimentation of AI Services supported by an API management strategy. {style="font-size:20px"}
* The experimentation will feed the design architecture and the landing zone that will go into production. {style="font-size:20px"}
* The labs are based on Jupyter Notebooks to enable clear and documented instructions, Python scripts, Bicep IaC and APIM policies. {style="font-size:20px"}
* There is a backlog of experiments that we plan to implement to take this work further and enable more advanced use cases. Stay tuned 🙂 {style="font-size:20px"}

---

<!-- .slide: data-auto-animate data-auto-animate-easing="cubic-bezier(0.770, 0.000, 0.175, 1.000)" -->

### Whant to know more?

[aka.ms/ai-gateway](https://aka.ms/ai-gateway)

<p><br/></p>
<div class="r-hstack justify-center">
<div data-id="box1" style="background: #F35325; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
<div data-id="box2" style="background: #81BC06; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
<div data-id="box3" style="background: #05A6F0; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
<div data-id="box4" style="background: #FFBA08; width: 50px; height: 50px; margin: 10px; border-radius: 5px;"></div>
</div>

---

<!-- .slide: data-auto-animate data-auto-animate-easing="cubic-bezier(0.770, 0.000, 0.175, 1.000)" -->

<div class="r-hstack justify-center">
<div data-id="box1" data-auto-animate-delay="0" style="background: #F35325; width: 100px; height: 100px; margin: 10px;"></div>
<div data-id="box2" data-auto-animate-delay="0.1" style="background: #81BC06; width: 100px; height: 100px; margin: 10px;"></div>
</div>
<div class="r-hstack justify-center">
<div data-id="box3" data-auto-animate-delay="0" style="background: #05A6F0; width: 100px; height: 100px; margin: 10px;"></div>
<div data-id="box4" data-auto-animate-delay="0.1" style="background: #FFBA08; width: 100px; height: 100px; margin: 10px;"></div>
</div>

## Thank You {style="margin-top: 20px;"}
Binary file added AI-GATEWAY.pptx
Binary file not shown.
26 changes: 17 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ Acknowledging the rising dominance of Python, particularly in the realm of AI, a

| | | |
| ---- | ----- | ----------- |
| [Request forwarding](labs/request-forwarding/request-forwarding.ipynb) | [![flow](images/request-forwarding.gif)](labs/request-forwarding/request-forwarding.ipynb) | Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system [managed identity](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-managed-service-identity) to authenticate into the Azure OpenAI service. |
| [Backend circuit breaking](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | [![flow](images/backend-circuit-breaking.gif)](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | Playground to try the built-in [backend circuit breaker functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either an Azure OpenAI endpoints or a mock server. |
| [Backend pool load balancing](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | [![flow](images/backend-pool-load-balancing.gif)](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | Playground to try the built-in load balancing [backend pool functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers. |
| [Advanced load balancing](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | [![flow](images/advanced-load-balancing.gif)](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | Playground to try the advanced load balancing (based on a custom [APIM policy](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers. |
| [Response streaming](labs/response-streaming/response-streaming.ipynb) | [![flow](images/response-streaming.gif)](labs/response-streaming/response-streaming.ipynb) | Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events#guidelines-for-sse). |
| [Vector searching](labs/vector-searching/vector-searching.ipynb) | [![flow](images/vector-searching.gif)](labs/vector-searching/vector-searching.ipynb) | Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. |
| [Built-in logging](labs/built-in-logging/built-in-logging.ipynb) | [![flow](images/built-in-logging.gif)](labs/built-in-logging/built-in-logging.ipynb) | Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook. |
| [SLM self-hosting](labs/slm-self-hosting/slm-self-hosting.ipynb) | [![flow](images/slm-self-hosting.gif)](labs/slm-self-hosting/slm-self-hosting.ipynb) | Playground to try the self-hosted [phy-2 Small Language Model (SLM)](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/en-us/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility. |
| [Request forwarding](labs/request-forwarding/request-forwarding.ipynb) | [![flow](images/request-forwarding-small.gif)](labs/request-forwarding/request-forwarding.ipynb) | Playground to try forwarding requests to either an Azure OpenAI endpoint or a mock server. APIM uses the system [managed identity](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-managed-service-identity) to authenticate into the Azure OpenAI service. |
| [Backend circuit breaking](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | [![flow](images/backend-circuit-breaking-small.gif)](labs/backend-circuit-breaking/backend-circuit-breaking.ipynb) | Playground to try the built-in [backend circuit breaker functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either an Azure OpenAI endpoints or a mock server. |
| [Backend pool load balancing](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | [![flow](images/backend-pool-load-balancing-small.gif)](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) | Playground to try the built-in load balancing [backend pool functionality of APIM](https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers. |
| [Advanced load balancing](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | [![flow](images/advanced-load-balancing-small.gif)](labs/advanced-load-balancing/advanced-load-balancing.ipynb) | Playground to try the advanced load balancing (based on a custom [APIM policy](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers. |
| [Response streaming](labs/response-streaming/response-streaming.ipynb) | [![flow](images/response-streaming-small.gif)](labs/response-streaming/response-streaming.ipynb) | Playground to try response streaming with APIM and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/en-us/azure/api-management/how-to-server-sent-events#guidelines-for-sse). |
| [Vector searching](labs/vector-searching/vector-searching.ipynb) | [![flow](images/vector-searching-small.gif)](labs/vector-searching/vector-searching.ipynb) | Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions. All the endpoints are managed via APIM. |
| [Built-in logging](labs/built-in-logging/built-in-logging.ipynb) | [![flow](images/built-in-logging-small.gif)](labs/built-in-logging/built-in-logging.ipynb) | Playground to try the [buil-in logging capabilities of API Management](https://learn.microsoft.com/en-us/azure/api-management/observability). The requests are logged into Application Insights and it's easy to track request/response details and token usage with provided notebook. |
| [SLM self-hosting](labs/slm-self-hosting/slm-self-hosting.ipynb) | [![flow](images/slm-self-hosting-small.gif)](labs/slm-self-hosting/slm-self-hosting.ipynb) | Playground to try the self-hosted [phy-2 Small Language Model (SLM)](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/) trough the [APIM self-hosted gateway](https://learn.microsoft.com/en-us/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility. |

### Backlog of experiments
* Developer tooling
Expand All @@ -34,6 +34,7 @@ Acknowledging the rising dominance of Python, particularly in the realm of AI, a
* Token rate limiting
* Cost tracking
* Content filtering
* PII handling
* Prompt storing
* Function calling
* Prompt guarding
Expand Down Expand Up @@ -80,7 +81,14 @@ The [app.py](app.py) can be customized to tailor the Mock server to specific use

* [Run locally or deploy to Azure](mock-server/mock-server.ipynb)

## 🥇 Resources
## 🎒 Presenting the AI Gateway concept
> [!TIP]
> Install the [VS Code Reveal extension](https://marketplace.visualstudio.com/items?itemName=evilz.vscode-reveal), open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code.
> [!TIP]
> Or just open the [AI-GATEWAY.pptx](AI-GATEWAY.pptx) for a plain old PowerPoint experience.
## 🥇 Other resources

Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.

Expand Down
Binary file added images/advanced-load-balancing-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/back.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/backend-circuit-breaking-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/backend-pool-load-balancing-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/built-in-logging-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/developer-tooling-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/request-forwarding-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/response-streaming-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/slm-self-hosting-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/toolchain.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/vector-searching-small.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f743ee7

Please sign in to comment.