Let's explore how you can perform inference with Phi-3-mini on Android devices. Phi-3-mini is a new series of models from Microsoft that enables deployment of Large Language Models (LLMs) on edge devices and IoT devices.
Semantic Kernel is an application framework that allows you to create applications compatible with Azure OpenAI Service, OpenAI models, and even local models. If you are new to Semantic Kernel, we suggest you look at the Semantic Kernel Cookbook.
You can combine it with the Hugging Face Connector in Semantic Kernel. Refer to this Sample Code.
By default, it corresponds to the model ID on Hugging Face. However, you can also connect to a locally built Phi-3-mini model server.
Many users prefer using quantized models to run models locally. Ollama and LlamaEdge allow individual users to call different quantized models:
You can directly run ollama run Phi-3
or configure it offline by creating a Modelfile
with the path to your .gguf
file.
FROM {Add your gguf file path}
TEMPLATE \"\"\"<|user|> .Prompt<|end|> <|assistant|>\"\"\"
PARAMETER stop <|end|>
PARAMETER num_ctx 4096
If you want to use .gguf
files in the cloud and on edge devices simultaneously, LlamaEdge is a great choice. You can refer to this sample code to get started.
- Download the MLC Chat app (Free) for Android phones.
- Download the APK file (148MB) and install it on your device.
- Launch the MLC Chat app. You'll see a list of AI models, including Phi-3-mini.
In summary, Phi-3-mini opens up exciting possibilities for generative AI on edge devices, and you can start exploring its capabilities on Android.