Learn how to monitor LLM inference in production using Prometheus and Grafana. Track p95 latency, tokens/sec, queue duration, and KV cache usage across vLLM, TGI, and llama.cpp. Includes PromQL examples, dashboards, alerts, Docker & Kubernetes setups.
#Monitoring #Hosting #Self-Hosting #LLM #AI #DevOps #Docker #K8S #Prometheus #Grafana #observability #kubernetes #vllm
https://www.glukhov.org/observability/monitoring-llm-inference-prometheus-grafana/