Post · Indieweb Studio

Post

Learn how to monitor LLM inference in production using Prometheus and Grafana. Track p95 latency, tokens/sec, queue duration, and KV cache usage across vLLM, TGI, and llama.cpp. Includes PromQL examples, dashboards, alerts, Docker & Kubernetes setups.

#Monitoring #Hosting #Self-Hosting #LLM #AI #DevOps #Docker #K8S #Prometheus #Grafana #observability #kubernetes #vllm

https://www.glukhov.org/observability/monitoring-llm-inference-prometheus-grafana/

Indieweb Studio

This is a relaxed, online social space for the indieweb community, brought to you by indieweb.social.

Please abide by our code of conduct and have a nice time!

Indieweb Studio: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.2-alpha.34 no JS en

Automatic federation enabled