Discussion
Loading...

Post

Log in
  • Sign up
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Rost Glukhov
Rost Glukhov
@ros@techhub.social  ·  activity timestamp 23 hours ago

Learn how to monitor LLM inference in production using Prometheus and Grafana. Track p95 latency, tokens/sec, queue duration, and KV cache usage across vLLM, TGI, and llama.cpp. Includes PromQL examples, dashboards, alerts, Docker & Kubernetes setups.

#Monitoring #Hosting #Self-Hosting #LLM #AI #DevOps #Docker #K8S #Prometheus #Grafana #observability #kubernetes #vllm

https://www.glukhov.org/observability/monitoring-llm-inference-prometheus-grafana/

  • Copy link
  • Flag this post
  • Block

Indieweb Studio

This is a relaxed, online social space for the indieweb community, brought to you by indieweb.social.

Please abide by our code of conduct and have a nice time!

Indieweb Studio: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.34 no JS en
Automatic federation enabled
Log in Create account
Instance logo
  • Explore
  • About
  • Members
  • Code of Conduct