Analytics and Sandbox Technology Stack
Overview
This section describes the technology stack used for the Analytics and Sandbox environment of the IUDX-Novo platform.
The stack is organized into concentric functional layers, where each layer plays a distinct role in the data science and machine learning (ML) workflow. Together, these layers enable scalable, reproducible, and resource-efficient analytical and AI workloads.

Figure: Analytics and Sandbox Technology Stack ⚠️ Diagram to be updated / added
Core Layer
The core layer provides foundational runtime and storage capabilities.
Docker
Provides containerization for packaging applications along with their dependencies
Ensures portability and reproducibility across environments
NVIDIA
Enables GPU acceleration for deep learning and compute-intensive AI workloads
Supports training and inference for advanced ML models
MinIO
Provides S3-compatible object storage
Used for managing:
Large datasets
Intermediate processing artefacts
Model artefacts
Orchestrator Layer
This layer manages containerized workloads and workflow execution.
Kubernetes
Serves as the container orchestration platform
Manages workload scheduling, scaling, and deployment
Enables isolation and resource governance
Argo
Provides workflow orchestration on Kubernetes
Supports execution and management of:
ETL workflows
ML pipelines
Batch and scheduled jobs
Executors Layer
The executors layer provides distributed and scalable compute engines.
PyTorch
A widely used deep learning framework
Supports model training, evaluation, and inference
Dask
A parallel computing library for scaling Python-based data science workloads
Enables distributed processing of large datasets
Ray
A distributed computing framework for scalable ML workloads
Supports:
Distributed model training
Hyperparameter tuning
Large-scale execution
Application Layer
The application layer provides user-facing tools for development and experimentation.
JupyterHub
Enables multi-user interactive notebook environments
Serves as the primary interface for data science and ML development
Kubeflow
Provides a Kubernetes-native ML toolkit
Supports:
Scalable model training
Model deployment and serving
Workflow automation and lifecycle management
Summary
The Analytics and Sandbox technology stack is designed to support advanced AI, ML, and data analytics workflows by ensuring:
Scalability through containerization and orchestration
Reproducibility via standardized runtime environments
Efficient resource utilization using distributed computing frameworks
The combination of Docker, Kubernetes, Argo, Kubeflow, Ray, Dask, PyTorch, and JupyterHub provides a robust and flexible foundation for data-driven applications within the sandbox environment.
Last updated

