Monitoring Grafana
Monitoring your Grafana deployment ensures optimal performance and helps you identify issues before they impact users. This guide covers the available monitoring endpoints, key metrics, and health check mechanisms.Health check endpoints
Grafana exposes several HTTP endpoints to monitor instance health.Liveness probe
The/livez endpoint returns 200 OK if the Grafana HTTP server is running:
pkg/server/health.go:36 for implementation details.
Readiness probe
The/readyz endpoint returns 200 OK when Grafana has completed initialization and is ready to serve requests:
503 Service Unavailable.
Ready response:
pkg/server/health.go:45 for implementation details.
Database health
The/api/health endpoint checks both web server and database connectivity:
pkg/api/health.go:10 for implementation.
Legacy health check
The/healthz endpoint provides a simple 200 OK response if the web server is running:
Prometheus metrics
Grafana exposes Prometheus-compatible metrics at the/metrics endpoint for comprehensive monitoring.
Enabling metrics
Metrics are disabled by default. Enable them in your configuration:<PASSWORD> with a secure password.
Accessing metrics
Once enabled, access metrics at:pkg/api/http_server.go:736 for the metrics handler implementation.
Key metrics
Grafana exposes metrics under thegrafana_ namespace. Here are the most important metrics for operational monitoring:
Instance metrics
grafana_instance_start_total- Counter of instance startsgrafana_build_info- Build information with version, revision, and edition labelsgrafana_environment_info- Environment information from operator-provided labels
HTTP metrics
grafana_api_response_status_total- API HTTP response status codesgrafana_page_response_status_total- Page HTTP response status codesgrafana_proxy_response_status_total- Proxy HTTP response status codes
Database metrics
grafana_stat_total_dashboards- Total number of dashboardsgrafana_stat_total_users- Total number of usersgrafana_stat_total_orgs- Total number of organizationsgrafana_stat_totals_datasource- Total datasources by plugin ID
Performance metrics
grafana_api_dashboard_search_milliseconds- Dashboard search durationgrafana_api_dataproxy_request_all_milliseconds- Data proxy request durationgrafana_rendering_request_duration_milliseconds- Image rendering duration
Alerting metrics
grafana_alerting_active_alerts- Number of active alertsgrafana_alerting_result_total- Alert execution results by stategrafana_alerting_notification_sent_total- Alert notifications sent by typegrafana_alerting_notification_failed_total- Failed alert notifications by typegrafana_alerting_execution_time_milliseconds- Alert execution duration
Access control metrics
grafana_access_evaluation_count- Number of permission evaluationsgrafana_access_evaluation_duration- Permission evaluation durationgrafana_access_permissions_cache_usage- Cache hit/miss for permissions
pkg/infra/metrics/metrics.go for the complete metric definitions.
Application logging
Grafana writes structured logs that you can use for monitoring and troubleshooting.Log configuration
Configure logging ingrafana.ini:
Log levels
debug- Detailed debugging informationinfo- General operational messageswarn- Warning messages for potential issueserror- Error messages for failurescritical- Critical failures requiring immediate attention
File logging
When file mode is enabled, logs are written to the directory specified in thepaths section:
Performance monitoring
Monitor these key performance indicators for Grafana health:Database connections
- Monitor database connection pool usage
- Check for connection exhaustion
- Review slow query logs
Memory usage
- Track Grafana process memory consumption
- Monitor for memory leaks over time
- Set appropriate resource limits in container environments
Request latency
- Monitor dashboard load times
- Track data source query durations
- Review API response times
Error rates
- Track HTTP 5xx error rates
- Monitor data source query failures
- Review alert evaluation errors
Example monitoring setup
You can monitor Grafana using Grafana itself by configuring Prometheus to scrape the metrics endpoint: Prometheus configuration:- Request rates and latencies
- Error rates by endpoint
- Database query performance
- Active user sessions
- Alert evaluation metrics
Next steps
- Set up backup and restore procedures
- Configure high availability for production deployments
- Review the upgrading guide for version updates