Monitoring and Logging Utilities
Overview
To ensure full system observability, the IUDX-Novo platform provides comprehensive monitoring and logging utilities across the AI Sandbox and associated platform components.
The observability stack supports both:
Developer-defined application logs
Runtime system and service-level metrics
Together, these capabilities enable effective monitoring, debugging, performance analysis, and operational governance.
Monitoring Capabilities
The platform uses Prometheus-based monitoring to collect metrics at multiple layers of the system.
Infrastructure Metrics
Prometheus is used to collect virtual machine and node-level metrics, including:
CPU utilization
Memory usage
Network throughput and latency
Other system-level resource statistics
These metrics provide visibility into infrastructure health and capacity.
Service and Application Metrics
Prometheus also collects service-level and application metrics, such as:
API call counts and request latencies
Event bus and messaging metrics
JVM performance metrics (where applicable)
Other operational indicators exposed by platform services
These metrics enable fine-grained monitoring of platform behavior and performance.
Logging Capabilities
Centralized Log Aggregation
The platform uses Loki-based log aggregation to collect and manage logs generated by platform services and user workloads.
Capabilities include:
Collection of developer-defined log messages
Centralized storage and indexing of logs
Support for runtime debugging and validation
Efficient diagnosis of operational issues and failures
Logs can be correlated with metrics to provide deeper insight into system behavior.
Observability Outcomes
Together, the monitoring and logging utilities provide:
End-to-end visibility into system behavior and performance
Early detection of failures and anomalies
Support for debugging, auditing, and root-cause analysis
Improved operational reliability and platform health
Role in the Platform
Enables proactive system monitoring
Supports operational excellence and SRE practices
Complements security, auditing, and compliance mechanisms
Last updated

