Monitoring & Telemetry
The platform includes an observability stack to collect resource consumption, HTTP request latency, and gateway health metrics.
1. Observability Stack
- Metric Scraper: Prometheus (
prom/prometheus:v2.51.1) - Visualization Dashboard: Grafana (
grafana/grafana:10.4.2) - Config Renderer: Alpine image using
envsubstto dynamically inject environment variables intoprometheus.yml.
2. Scraping Flow & Telemetry Formats
The system gathers data using Prometheus pull-based scraping:
1. The Proxy Server records system state (CPU, memory consumption) and network latency metrics using the prom-client library.
2. It exposes these metrics in OpenMetrics format at the authenticated /metrics endpoint.
3. Every 15 seconds, Prometheus queries this endpoint.
4. Grafana visualizes the scraped metrics.
Prometheus Configuration Configuration
The Prometheus server configuration is generated dynamically from a template file prometheus.yml.template:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'intelligence-cloud-proxy'
metrics_path: '/metrics'
static_configs:
- targets: ['proxy-server:3000']
3. Data Management
- Data Retention: Prometheus maintains metrics for 15 days, capped at a maximum database footprint of 10GB (
--storage.tsdb.retention.size=10GB). - Grafana Provisioning: Dashboard JSON designs and default data sources are pre-configured inside the
grafana/provisioningdirectory, allowing dashboards to load automatically on startup.