This project involves the deployment of a text-to-image generation backend powered by Stable Diffusion 3. It leverages modern cloud-native tools and architectures, including Amazon Elastic Kubernetes Service (EKS), Istio, and KServe, along with monitoring and observability tools like Kiali, Prometheus, and Grafana. The project uses TorchScript to integrate the model, with model weights stored in Amazon S3 and fetched during the bootstrap process.
1. Amazon EKS (Elastic Kubernetes Service)
-
Role: Provides a managed Kubernetes environment to host and orchestrate containerized applications, ensuring scalability, reliability, and ease of management.
-
Why Used: EKS simplifies Kubernetes cluster management and integrates seamlessly with AWS services like S3, IAM, and CloudWatch, making it ideal for hosting cloud-native workloads.
2. eksctl
-
Role: CLI tool for creating and managing EKS clusters.
-
Why Used: Simplifies the process of creating and configuring EKS clusters, reducing setup complexity.
3. Istio
-
Role: Service mesh that provides traffic management, security, and observability for microservices.
-
Why Used: Istio is used to ensure secure, controlled communication between services and to enhance traffic observability and resilience in the system.
4. KServe
-
Role: Model serving platform for deploying and managing machine learning models on Kubernetes.
-
Why Used: KServe provides high-performance, scalable, and easy-to-use inference-serving capabilities with native Kubernetes support. It simplifies serving the Stable Diffusion 3 model and integrates well with Istio.
- TorchScript
-
Role: Converts PyTorch models into a format suitable for deployment.
-
Why Used: Facilitates model deployment by making the model portable and efficient for serving.
6. Amazon S3
-
Role: Object storage service used to store the model weights.
-
Why Used: Reliable and scalable storage solution that integrates seamlessly with AWS, enabling efficient access to large files during bootstrap.
7. Monitoring and Observability Tools
Kiali
-
Role: Visualizes the service mesh topology and provides insights into Istio's components.
-
Why Used: Helps in understanding and managing service-to-service communication.
Prometheus
-
Role: Monitoring and alerting toolkit for collecting and querying metrics.
-
Why Used: Provides detailed performance metrics of the system.
Grafana
-
Role: Visualizes data through dashboards and graphs.
-
Why Used: Creates intuitive dashboards to analyze metrics collected by Prometheus.

Step - 1
First we need to download the required model from hugginface: download_model.py
python download_model.py
This will download model weights and store in sd3-model
directory
Step - 2
Now we need to create .mar
file using model weights. Before we do that we need to create a handler file - sd3_handler
The handler does the following:
1. Initialise - Pull model weights from S3 and load model
2. Pre-process input to be accpetable for inference
3. Model inference
4. Post-process output before returning output
Step - 3
Now that we have the handler ready, we can go ahead and create .mar
file - create_mar.sh
#!/bin/bash
torch-model-archiver \
--model-name sd3 \
--version 1.0 \
--handler sd3_handler.py \
--requirements-file requirements.txt \
-f \
--export-path ./model-store
echo "MAR file created at ../model-store/sd3.mar"
Here we specify handler, directory of model weights and requirements.txt. The output will be mar file stored in model-store
directory
Step - 4 Once the mar file is ready we can proceed to upload to AWS S3 bucket - upload_to_s3.sh config file used - config.properties
Now the steps to build torchscript model are completed. We move to deployment and inference
Step - 1 We will create a yaml file to deploy our model - sd3-isvc.yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve-sd3"
spec:
predictor:
serviceAccountName: s3-read-only
pytorch:
protocolVersion: v1
storageUri: "s3://sd3-kserve/sd3/"
image: pytorch/torchserve-kfs:0.12.0-gpu
resources:
limits:
cpu: "8"
memory: 16Gi
nvidia.com/gpu: "1"
env:
- name: TS_DISABLE_TOKEN_AUTHORIZATION
value: "true"
Run command kubectl apply -f sd3-isvc.yaml
to create deployment
Step - 2 We can track our deployment using below command -
# Check pods status
kubectl get pods
kubectl describe pod <pod-name>
# Check deployment status
kubectl get deployment
kubectl describe deployment <deployment-name>
# Check pod logs
kubectl logs -f <pod-name>
Above commands will help us understand if our model is ready for inference or not
Step - 3 In order to perform inference we will need to fetch below details -
# fetch host
kubectl get isvc
# output - torchserve-sd3-default.example.com
# fetch endpoint
kubectl get isvc -n istio-system
# output - k8s-istioing-istioing-163c0111d9-c1e5608c48727220.elb.ap-south-1.amazonaws.com
Step - 4 Now using details from step 3 we perform inference - test_inference.py
python test_inference.py
logs of kubernetes resources - all_deployments.yaml

Inference 1
Inference 2
Inference 3
Inference 4
Inference 5