Skip to content

Commit

Permalink
fix readme
Browse files Browse the repository at this point in the history
  • Loading branch information
vieiraae committed Feb 7, 2025
1 parent af88ebc commit 60dec8e
Show file tree
Hide file tree
Showing 11 changed files with 1,676 additions and 27 deletions.
48 changes: 21 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,10 @@ This repo explores the **AI Gateway** pattern through a series of experimental l

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts, [Bicep](https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview?tabs=bicep) files and [Azure API Management policies](https://learn.microsoft.com/azure/api-management/api-management-howto-policies):

### Current Labs

These labs are currently recommended after which to model your workloads.

<!-- Backend pool load balancing -->
#### [**🧪 Backend pool load balancing**](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) - Available with [Bicep](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) and [Terraform](labs/backend-pool-load-balancing-tf/backend-pool-load-balancing-tf.ipynb)
### [**🧪 Backend pool load balancing**](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) - Available with [Bicep](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb) and [Terraform](labs/backend-pool-load-balancing-tf/backend-pool-load-balancing-tf.ipynb)

Playground to try the built-in load balancing [backend pool functionality of Azure API Management](https://learn.microsoft.com/azure/api-management/backends?tabs=bicep) to either a list of Azure OpenAI endpoints or mock servers.

Expand All @@ -58,7 +56,7 @@ Playground to try the built-in load balancing [backend pool functionality of Azu
[🦾 Bicep](labs/backend-pool-load-balancing/main.bicep)[⚙️ Policy](labs/backend-pool-load-balancing/policy.xml)[🧾 Notebook](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb)

<!-- Token rate limiting -->
#### [**🧪 Token rate limiting**](labs/token-rate-limiting/token-rate-limiting.ipynb)
### [**🧪 Token rate limiting**](labs/token-rate-limiting/token-rate-limiting.ipynb)

Playground to try the [token rate limiting policy](https://learn.microsoft.com/azure/api-management/azure-openai-token-limit-policy) to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.

Expand All @@ -67,7 +65,7 @@ Playground to try the [token rate limiting policy](https://learn.microsoft.com/a
[🦾 Bicep](labs/token-rate-limiting/main.bicep)[⚙️ Policy](labs/token-rate-limiting/policy.xml)[🧾 Notebook](labs/token-rate-limiting/token-rate-limiting.ipynb)

<!-- Token metrics emitting -->
#### [**🧪 Token metrics emitting**](labs/token-metrics-emitting/token-metrics-emitting.ipynb)
### [**🧪 Token metrics emitting**](labs/token-metrics-emitting/token-metrics-emitting.ipynb)

Playground to try the [emit token metric policy](https://learn.microsoft.com/azure/api-management/azure-openai-emit-token-metric-policy). The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

Expand All @@ -76,7 +74,7 @@ Playground to try the [emit token metric policy](https://learn.microsoft.com/azu
[🦾 Bicep](labs/token-metrics-emitting/main.bicep)[⚙️ Policy](labs/token-metrics-emitting/policy.xml)[🧾 Notebook](labs/token-metrics-emitting/token-metrics-emitting.ipynb)

<!-- Semantic caching -->
#### [**🧪 Semantic caching**](labs/semantic-caching/semantic-caching.ipynb)
### [**🧪 Semantic caching**](labs/semantic-caching/semantic-caching.ipynb)

Playground to try the [semantic caching policy](https://learn.microsoft.com/azure/api-management/azure-openai-semantic-cache-lookup-policy). Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.

Expand All @@ -85,7 +83,7 @@ Playground to try the [semantic caching policy](https://learn.microsoft.com/azur
[🦾 Bicep](labs/semantic-caching/main.bicep)[⚙️ Policy](labs/semantic-caching/policy.xml)[🧾 Notebook](labs/semantic-caching/semantic-caching.ipynb)

<!-- Access controlling -->
#### [**🧪 Access controlling**](labs/access-controlling/access-controlling.ipynb)
### [**🧪 Access controlling**](labs/access-controlling/access-controlling.ipynb)

Playground to try the [OAuth 2.0 authorization feature](https://learn.microsoft.com/azure/api-management/api-management-authenticate-authorize-azure-openai#oauth-20-authorization-using-identity-provider) using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.

Expand All @@ -94,7 +92,7 @@ Playground to try the [OAuth 2.0 authorization feature](https://learn.microsoft.
[🦾 Bicep](labs/access-controlling/main.bicep)[⚙️ Policy](labs/access-controlling/policy.xml)[🧾 Notebook](labs/access-controlling/access-controlling.ipynb)

<!-- zero-to-production -->
#### [**🧪 Zero-to-Production**](labs/zero-to-production/zero-to-production.ipynb)
### [**🧪 Zero-to-Production**](labs/zero-to-production/zero-to-production.ipynb)

Playground to create a combination of several policies in an iterative approach. We start with load balancing, then progressively add token emitting, rate limiting, and, eventually, semantic caching. Each of these sets of policies is derived from other labs in this repo.

Expand All @@ -103,7 +101,7 @@ Playground to create a combination of several policies in an iterative approach.
[🦾 Bicep](labs/zero-to-production/main.bicep)[⚙️ Policy](labs/zero-to-production/policy-3.xml)[🧾 Notebook](labs/zero-to-production/zero-to-production.ipynb)

<!-- GPT-4o inferencing -->
#### [**🧪 GPT-4o inferencing**](labs/GPT-4o-inferencing/GPT-4o-inferencing.ipynb)
### [**🧪 GPT-4o inferencing**](labs/GPT-4o-inferencing/GPT-4o-inferencing.ipynb)

Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.

Expand All @@ -112,7 +110,7 @@ Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to h
[🦾 Bicep](labs/GPT-4o-inferencing/main.bicep)[⚙️ Policy](labs/GPT-4o-inferencing/policy.xml)[🧾 Notebook](labs/GPT-4o-inferencing/GPT-4o-inferencing.ipynb)

<!-- Function calling -->
#### [**🧪 Function calling**](labs/function-calling/function-calling.ipynb)
### [**🧪 Function calling**](labs/function-calling/function-calling.ipynb)

Playground to try the OpenAI [function calling](https://learn.microsoft.com/azure/ai-services/openai/how-to/function-calling?tabs=non-streaming%2Cpython) feature with an Azure Functions API that is also managed by Azure API Management.

Expand All @@ -121,15 +119,15 @@ Playground to try the OpenAI [function calling](https://learn.microsoft.com/azur
[🦾 Bicep](labs/function-calling/main.bicep)[⚙️ Policy](labs/function-calling/policy.xml)[🧾 Notebook](labs/function-calling/function-calling.ipynb)

<!-- Model Routing -->
#### [**🧪 Model Routing**](labs/model-routing/model-routing.ipynb)
### [**🧪 Model Routing**](labs/model-routing/model-routing.ipynb)

Playground to try routing to a backend based on Azure OpenAI model and version.

[<img src="images/model-routing-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">](labs/model-routing/model-routing.ipynb)

[🦾 Bicep](labs/model-routing/main.bicep)[⚙️ Policy](labs/model-routing/policy.xml)[🧾 Notebook](labs/model-routing/model-routing.ipynb)
<!-- Response streaming -->
#### [**🧪 Response streaming**](labs/response-streaming/response-streaming.ipynb)
### [**🧪 Response streaming**](labs/response-streaming/response-streaming.ipynb)

Playground to try response streaming with Azure API Management and Azure OpenAI endpoints to explore the advantages and shortcomings associated with [streaming](https://learn.microsoft.com/azure/api-management/how-to-server-sent-events#guidelines-for-sse).

Expand All @@ -138,7 +136,7 @@ Playground to try response streaming with Azure API Management and Azure OpenAI
[🦾 Bicep](labs/response-streaming/main.bicep)[⚙️ Policy](labs/response-streaming/policy.xml)[🧾 Notebook](labs/response-streaming/response-streaming.ipynb)

<!-- Vector searching -->
#### [**🧪 Vector searching**](labs/vector-searching/vector-searching.ipynb)
### [**🧪 Vector searching**](labs/vector-searching/vector-searching.ipynb)

Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://learn.microsoft.com/azure/search/retrieval-augmented-generation-overview) with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.

Expand All @@ -147,7 +145,7 @@ Playground to try the [Retrieval Augmented Generation (RAG) pattern](https://lea
[🦾 Bicep](labs/vector-searching/main.bicep)[⚙️ Policy](labs/vector-searching/policy.xml)[🧾 Notebook](labs/vector-searching/vector-searching.ipynb)

<!-- Built-in logging -->
#### [**🧪 Built-in logging**](labs/built-in-logging/built-in-logging.ipynb)
### [**🧪 Built-in logging**](labs/built-in-logging/built-in-logging.ipynb)

Playground to try the [buil-in logging capabilities of Azure API Management](https://learn.microsoft.com/azure/api-management/observability). Logs requests into App Insights to track details and token usage.

Expand All @@ -156,7 +154,7 @@ Playground to try the [buil-in logging capabilities of Azure API Management](htt
[🦾 Bicep](labs/built-in-logging/main.bicep)[⚙️ Policy](labs/built-in-logging/policy.xml)[🧾 Notebook](labs/built-in-logging/built-in-logging.ipynb)

<!-- SLM self-hosting -->
#### [**🧪 SLM self-hosting**](labs/slm-self-hosting/slm-self-hosting.ipynb) (phy-3)
### [**🧪 SLM self-hosting**](labs/slm-self-hosting/slm-self-hosting.ipynb) (phy-3)

Playground to try the self-hosted [phy-3 Small Language Model (SLM)](https://azure.microsoft.com/blog/introducing-phi-3-redefining-whats-possible-with-slms/) through the [Azure API Management self-hosted gateway](https://learn.microsoft.com/azure/api-management/self-hosted-gateway-overview) with OpenAI API compatibility.

Expand All @@ -165,7 +163,7 @@ Playground to try the self-hosted [phy-3 Small Language Model (SLM)](https://azu
[🦾 Bicep](labs/slm-self-hosting/main.bicep)[⚙️ Policy](labs/slm-self-hosting/policy.xml)[🧾 Notebook](labs/slm-self-hosting/slm-self-hosting.ipynb)

<!-- Message storing -->
#### [**🧪 Message storing**](labs/message-storing/message-storing.ipynb)
### [**🧪 Message storing**](labs/message-storing/message-storing.ipynb)

Playground to test storing message details into Cosmos DB through the [Log to event hub](https://learn.microsoft.com/azure/api-management/log-to-eventhub-policy) policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.).

Expand All @@ -174,7 +172,7 @@ Playground to test storing message details into Cosmos DB through the [Log to ev
[🦾 Bicep](labs/message-storing/main.bicep)[⚙️ Policy](labs/message-storing/policy.xml)[🧾 Notebook](labs/message-storing/message-storing.ipynb)

<!-- Developer tooling -->
<!-- #### [**🧪 Developer tooling** (WIP)](labs/developer-tooling/developer-tooling.ipynb)
<!-- ### [**🧪 Developer tooling** (WIP)](labs/developer-tooling/developer-tooling.ipynb)
Playground to try the developer tooling available with Azure API Management to develop, debug, test and publish AI Service APIs.
Expand All @@ -183,7 +181,7 @@ Playground to try the developer tooling available with Azure API Management to d
[🦾 Bicep](labs/developer-tooling/main.bicep) ➕ [⚙️ Policy](labs/developer-tooling/policy.xml) ➕ [🧾 Notebook](labs/developer-tooling/developer-tooling.ipynb) -->

<!-- Prompt flow -->
#### [**🧪 Prompt flow**](labs/prompt-flow/prompt-flow.ipynb)
### [**🧪 Prompt flow**](labs/prompt-flow/prompt-flow.ipynb)

Playground to try the [Azure AI Studio Prompt Flow](https://learn.microsoft.com/azure/ai-studio/how-to/prompt-flow) with Azure API Management.

Expand All @@ -192,7 +190,7 @@ Playground to try the [Azure AI Studio Prompt Flow](https://learn.microsoft.com/
[🦾 Bicep](labs/prompt-flow/main.bicep)[⚙️ Policy](labs/prompt-flow/policy.xml)[🧾 Notebook](labs/prompt-flow/prompt-flow.ipynb)

<!-- Content Filtering -->
#### [**🧪 Content Filtering**](labs/content-filtering/content-filtering.ipynb)
### [**🧪 Content Filtering**](labs/content-filtering/content-filtering.ipynb)

Playground to try integrating Azure API Management with [Azure AI Content Safety](https://learn.microsoft.com/azure/ai-services/content-safety/overview) to filter potentially offensive, risky, or undesirable content.

Expand All @@ -201,28 +199,24 @@ Playground to try integrating Azure API Management with [Azure AI Content Safety
[🦾 Bicep](labs/content-filtering/main.bicep)[⚙️ Policy](labs/content-filtering/content-filtering-policy.xml)[🧾 Notebook](labs/content-filtering/content-filtering.ipynb)

<!-- Prompt Shielding -->
#### [**🧪 Prompt Shielding**](labs/content-filtering/prompt-shielding.ipynb)
### [**🧪 Prompt Shielding**](labs/content-filtering/prompt-shielding.ipynb)

Playground to try Prompt Shields from Azure AI Content Safety service that analyzes LLM inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

[<img src="images/content-filtering-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">](labs/content-filtering/prompt-shielding.ipynb)

[🦾 Bicep](labs/content-filtering/main.bicep)[⚙️ Policy](labs/content-filtering/prompt-shield-policy.xml)[🧾 Notebook](labs/content-filtering/prompt-shielding.ipynb)

### Deprecated Labs

These labs are no longer applicable. If you have implemented logic from these labs, please consider updating.

<!-- Advanced load balancing -->
#### [**🧪 Advanced load balancing**](labs/advanced-load-balancing/advanced-load-balancing.ipynb) (custom)
#### [**🧪 Load balancing with policy expressions**](labs/advanced-load-balancing/advanced-load-balancing.ipynb) (consider to use [backend pool load balancing instead](labs/backend-pool-load-balancing/backend-pool-load-balancing.ipynb))

Playground to try the advanced load balancing (based on a custom [Azure API Management policy](https://learn.microsoft.com/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers.
Playground to try the load balancing (based on a custom [Azure API Management policy](https://learn.microsoft.com/azure/api-management/api-management-howto-policies)) to either a list of Azure OpenAI endpoints or mock servers.

[<img src="images/advanced-load-balancing-small.gif" alt="flow" style="width: 437px; display: inline-block;" data-target="animated-image.originalImage">](labs/advanced-load-balancing/advanced-load-balancing.ipynb)

[🦾 Bicep](labs/advanced-load-balancing/main.bicep)[⚙️ Policy](labs/advanced-load-balancing/policy.xml)[🧾 Notebook](labs/advanced-load-balancing/advanced-load-balancing.ipynb)

### Backlog of Labs
## Backlog of Labs

This is a list of potential future labs to be developed.

Expand Down
38 changes: 38 additions & 0 deletions labs/ai-foundry-deepseek/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# APIM ❤️ OpenAI

## [Token Metrics Emitting lab](token-metrics-emitting.ipynb)

[![flow](../../images/token-metrics-emitting.gif)](token-metrics-emitting.ipynb)

Playground to try the [emit token metric policy](https://learn.microsoft.com/azure/api-management/azure-openai-emit-token-metric-policy). The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

Notes:

- Token count metrics include: Total Tokens, Prompt Tokens, and Completion Tokens.
- This policy supports OpenAI response streaming! Use the [streaming tool](../../tools/streaming.ipynb) to test and troubleshoot response streaming.
- Use the [tracing tool](../../tools/tracing.ipynb) to track the behavior and troubleshoot the [policy](policy.xml).

[View policy configuration](policy.xml)

### Result

![result](result.png)

### Prerequisites

- [Python 3.12 or later version](https://www.python.org/) installed
- [Pandas Library](https://pandas.pydata.org) installed
- [VS Code](https://code.visualstudio.com/) installed with the [Jupyter notebook extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) enabled
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli) installed
- [An Azure Subscription](https://azure.microsoft.com/free/) with Contributor permissions
- [Access granted to Azure OpenAI](https://aka.ms/oai/access)
- [Sign in to Azure with Azure CLI](https://learn.microsoft.com/cli/azure/authenticate-azure-cli-interactively)

### 🚀 Get started

Proceed by opening the [Jupyter notebook](token-metrics-emitting.ipynb), and follow the steps provided.

### 🗑️ Clean up resources

When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered.
Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.
Loading

0 comments on commit 60dec8e

Please sign in to comment.