Skip to main content

Metrics Module

The Metrics module is a comprehensive monitoring and observability system for the Comdeall platform that provides real-time application performance monitoring, request tracking, and operational insights. It integrates with Prometheus for metrics collection, Grafana for visualization, and implements automated alerting for critical performance indicators. The module tracks API performance, user behavior patterns, system health, and provides detailed analytics for optimization and troubleshooting.

Table of Contents

  1. Module Structure
  2. Metrics Endpoints
  3. Core Features
  4. Prometheus Integration
  5. Metrics Collection
  6. Performance Monitoring
  7. User Analytics
  8. Alerting System
  9. Integration Points
  10. Technical Implementation
  11. Best Practices
  12. Conclusion

Module Structure

The Metrics module follows a middleware-based architecture with Prometheus integration:

@Module({
imports: [
PrometheusModule.register({
defaultMetrics: {
enabled: true,
},
path: `${RouteNames.METRICS}`,
}),
],
providers: [MetricsService],
controllers: [MetricsController],
exports: [MetricsService],
})
export class MetricsModule {}

Core Components:

  1. Controller Layer (metrics.controller.ts): Exposes Prometheus metrics endpoint for scraping and monitoring

  2. Service Layer (metrics.service.ts): Implements custom metrics collection with counters, gauges, and histograms

  3. Middleware Layer (metrics.middleware.ts): Automatically collects request metrics for all API endpoints

  4. APM Configuration (apm/): Prometheus configuration, alerting rules, and Grafana dashboards

  5. Integration Layer: Seamless integration with the NestJS application lifecycle and monitoring infrastructure

Metrics Endpoints

EndpointMethodDescriptionAuth Type
/metricsGETPrometheus metrics scraping endpointNone (Public)

Metrics Endpoint Features:

  • Content-Type: text/plain for Prometheus compatibility
  • Real-time Data: Live metrics collection and aggregation
  • Performance Optimized: Efficient metrics generation without blocking requests
  • Comprehensive Coverage: Application, system, and custom metrics

Core Features

Application Performance Monitoring

The module provides comprehensive application performance tracking with detailed request analytics:

Request Metrics:

  • Total Requests: Complete count of all HTTP requests processed
  • Request Duration: Histogram-based latency tracking with percentile calculations
  • Concurrent Requests: Real-time gauge of active request processing
  • Error Rates: Failure tracking with detailed error categorization

Performance Analytics:

  • Response Time Tracking: End-to-end request processing duration
  • Throughput Monitoring: Requests per second and volume analytics
  • Resource Utilization: System resource consumption patterns
  • Bottleneck Identification: Slowest endpoints and performance optimization insights

User Behavior Analytics

Advanced user behavior tracking provides insights into application usage patterns:

User Agent Analysis:

  • Browser Detection: Automatic browser family identification using UAParser
  • Platform Tracking: Mobile vs web request categorization
  • Device Analytics: User agent string analysis for device insights

Traffic Source Analysis:

  • Referer Tracking: Source domain identification for traffic analytics
  • Direct vs Referred Traffic: User acquisition channel analysis
  • Invalid URL Handling: Graceful error handling for malformed referer data

System Health Monitoring

Real-time system health tracking ensures optimal application performance:

System Metrics:

  • Active Users: Unique IP address tracking for user activity monitoring
  • Request Load: Peak traffic identification and capacity planning
  • Error Distribution: Error pattern analysis across different endpoints
  • Performance Trends: Historical data analysis for trend identification

Prometheus Integration

Metrics Collection Configuration

The module integrates with Prometheus using industry-standard metrics types:

Prometheus Metrics Types:

  • Counters: Monotonically increasing values for totals (requests, errors)
  • Gauges: Current values that can increase or decrease (concurrent requests, active users)
  • Histograms: Distribution tracking with configurable buckets (request duration)

Default Metrics Integration:

PrometheusModule.register({
defaultMetrics: {
enabled: true, // Node.js runtime metrics
},
path: '/metrics', // Scraping endpoint
})

Custom Metrics Implementation

The service implements custom business logic metrics:

Business Metrics:

  • api_requests_total: Total API requests with method, route, and status labels
  • api_request_duration_seconds: Request processing time distribution
  • api_request_errors_total: Error tracking with detailed categorization
  • concurrent_http_requests: Real-time concurrent request monitoring

User Analytics Metrics:

  • api_requests_by_user_agent: Browser and device usage analytics
  • api_requests_by_referer: Traffic source and referral analytics
  • total_mobile_requests: Mobile platform usage tracking
  • total_web_requests: Web platform usage monitoring

Scraping Configuration

Prometheus scraping is configured for optimal performance and reliability:

Scraping Parameters:

  • Scrape Interval: 15 seconds for application metrics
  • Scrape Timeout: 10 seconds to prevent blocking
  • Metrics Path: /api/metrics for NestJS application
  • Target Configuration: Docker-compatible networking with host.docker.internal

Metrics Collection

Automatic Request Tracking

The MetricsMiddleware automatically collects comprehensive request data:

Request Lifecycle Tracking:

  1. Request Start: Performance timer initiation and concurrent request increment
  2. Request Processing: User agent, referer, and mobile detection
  3. Request Completion: Duration calculation, status tracking, and metrics recording
  4. Error Handling: Automatic error categorization and failure tracking

Middleware Implementation:

// Key tracking points
this.metricsService.incrementHttpRequests();
this.metricsService.observeRequestDuration(method, route, status, duration);
this.metricsService.incrementApiRequestCounter(method, route, status);

User Agent Processing

Advanced user agent analysis provides detailed client insights:

Browser Detection:

  • UAParser Integration: Accurate browser family identification
  • Fallback Handling: Unknown user agent graceful processing
  • Mobile Detection: Device type categorization for analytics

Referer Analysis:

  • URL Parsing: Domain extraction from referer headers
  • Error Handling: Invalid URL graceful processing
  • Unknown Source Tracking: Direct traffic identification

Performance Monitoring

Request Duration Analysis

Histogram-based request duration tracking provides detailed performance insights:

Duration Buckets:

buckets: [0.1, 0.3, 0.5, 1, 1.5, 2, 5, 10] // Seconds

Performance Calculations:

  • Average Latency: rate(api_request_duration_seconds_sum[5m]) / rate(api_request_duration_seconds_count[5m])
  • P99 Latency: histogram_quantile(0.99, rate(api_request_duration_seconds_bucket[5m]))
  • Slowest APIs: topk(5, rate(api_request_duration_seconds_sum[5m]) / rate(api_request_duration_seconds_count[5m]))

Error Rate Monitoring

Comprehensive error tracking enables proactive issue resolution:

Error Metrics:

  • Error Rate Percentage: (sum(rate(api_request_errors_total[5m])) / sum(rate(api_requests_total[5m]))) * 100
  • Error Distribution: Error categorization by endpoint, method, and status code
  • Error Trends: Historical error pattern analysis for trend identification

Throughput Analytics

Request volume monitoring supports capacity planning and optimization:

Throughput Metrics:

  • Requests Per Second: rate(api_requests_total[1m])
  • Request Volume: rate(api_requests_total[5m])
  • Peak Load Analysis: Maximum concurrent request tracking
  • Traffic Patterns: User activity pattern identification

User Analytics

Platform Usage Analysis

Detailed platform usage analytics support product decision-making:

Platform Metrics:

  • Mobile vs Web Ratio: (sum(rate(total_mobile_requests[1m])) / sum(rate(total_web_requests[1m])))
  • Browser Usage: topk(5, api_requests_by_user_agent)
  • Device Distribution: Mobile and web platform usage patterns

Traffic Source Analytics

Comprehensive traffic source analysis provides marketing insights:

Source Analytics:

  • Top Referers: topk(5, api_requests_by_referer)
  • Direct Traffic: Users accessing application directly
  • Referral Traffic: External website traffic analysis
  • Traffic Quality: User engagement patterns by source

Alerting System

Critical Performance Alerts

Automated alerting ensures rapid response to performance degradation:

High Error Rate Alert:

alert: HighErrorRate
expr: nestjs:api_error_rate_percent > 5
for: 5m
severity: critical

High Latency Alert:

alert: HighP99Latency  
expr: nestjs:api_latency_p99 > 2
for: 5m
severity: warning

System Health Alerts

System resource monitoring prevents service degradation:

Concurrent Request Alert:

alert: HighConcurrentRequests
expr: concurrent_http_requests > 500
for: 2m
severity: warning

Alert Integration:

  • Severity Levels: Critical, warning, and informational alerts
  • Team Assignment: Backend team notification and escalation
  • Detailed Descriptions: Actionable alert messages with context

Integration Points

Application Middleware Integration

Seamless integration with NestJS middleware pipeline:

Middleware Registration:

// Global middleware application
consumer.apply(MetricsMiddleware).forRoutes('*');

Performance Considerations:

  • Non-blocking Operations: Metrics collection doesn't impact request processing
  • Efficient Processing: Optimized metric calculation and storage
  • Memory Management: Proper metric lifecycle and cleanup

External Monitoring Integration

Integration with external monitoring and alerting systems:

Prometheus Integration:

  • Standard Metrics Format: Industry-standard Prometheus exposition format
  • Label-based Querying: Flexible metric filtering and aggregation
  • Time Series Storage: Historical data retention and analysis

Grafana Dashboard Integration:

  • Real-time Visualization: Live metric dashboard and alerting
  • Custom Dashboards: Business-specific metric visualization
  • Alert Management: Visual alert status and management interface

Technical Implementation

Metric Types and Usage

Strategic use of different Prometheus metric types:

Counter Implementation:

// Monotonically increasing values
private readonly apiRequestCounter: Counter<string>;
this.apiRequestCounter.labels(method, route, status).inc();

Gauge Implementation:

// Current state values
private readonly concurrentRequests: Gauge<string>;
this.concurrentRequests.inc(); // Increment
this.concurrentRequests.dec(); // Decrement

Histogram Implementation:

// Distribution tracking
private readonly apiRequestDuration: Histogram<string>;
this.apiRequestDuration.labels(method, route, status).observe(duration);

Label Strategy

Comprehensive labeling strategy for flexible querying:

Request Labels:

  • method: HTTP method (GET, POST, PUT, DELETE)
  • route: API endpoint path for specific endpoint analysis
  • status: HTTP status code for error categorization

User Analytics Labels:

  • browser_family: Browser type for client analysis
  • referer_domain: Traffic source for marketing analytics
  • mobile_request/web_request: Platform categorization

Performance Optimization

Metrics collection is optimized for minimal performance impact:

Optimization Strategies:

  • Asynchronous Processing: Metrics recorded on response completion
  • Efficient Labeling: Strategic label usage to prevent cardinality explosion
  • Memory Management: Proper metric cleanup and resource management
  • Sampling Strategy: Configurable sampling for high-traffic scenarios

Best Practices

Metric Design Principles

Strategic metric design ensures actionable insights:

Metric Selection:

  • Business Relevance: Metrics aligned with business objectives and SLAs
  • Actionable Data: Metrics that drive operational decisions and improvements
  • Performance Impact: Minimal overhead metric collection and processing
  • Cardinality Management: Controlled label combinations to prevent storage issues

Monitoring Strategy

Comprehensive monitoring approach for proactive system management:

Monitoring Hierarchy:

  • Real-time Alerts: Critical issues requiring immediate attention
  • Trend Analysis: Long-term performance pattern identification
  • Capacity Planning: Resource utilization forecasting and optimization
  • User Experience: Client-side performance impact assessment

Alert Management

Effective alerting strategy prevents alert fatigue and ensures rapid response:

Alert Design:

  • Threshold Tuning: Data-driven alert threshold configuration
  • Alert Grouping: Related alert consolidation to prevent notification flooding
  • Escalation Policies: Clear escalation paths for different severity levels
  • Documentation: Detailed runbooks for alert response and resolution

Conclusion

The Metrics module provides a comprehensive monitoring and observability foundation for the Comdeall platform. Key strengths include:

Comprehensive Coverage:

  • Application Performance: End-to-end request tracking with detailed analytics
  • User Behavior: Client usage patterns and traffic source analysis
  • System Health: Real-time system resource and performance monitoring
  • Business Metrics: Custom metrics aligned with business objectives

Production-Ready Integration:

  • Prometheus Compatibility: Industry-standard metrics format and collection
  • Grafana Dashboard Support: Real-time visualization and alerting capabilities
  • Automated Alerting: Proactive issue detection and notification systems
  • Performance Optimized: Minimal overhead metric collection and processing

Operational Excellence:

  • Real-time Insights: Live performance monitoring and issue detection
  • Historical Analysis: Trend identification and capacity planning support
  • Actionable Alerts: Targeted notifications with clear resolution guidance
  • Scalable Architecture: Supports high-traffic scenarios with efficient processing

The module's architecture enables data-driven decision making, proactive system management, and optimal user experience through comprehensive monitoring and alerting capabilities essential for production-grade applications serving the child development and therapy management platform.