Analytics and Sandbox Technology Stack

Overview

This section describes the technology stack used for the Analytics and Sandbox environment of the IUDX-Novo platform.

The stack is organized into concentric functional layers, where each layer plays a distinct role in the data science and machine learning (ML) workflow. Together, these layers enable scalable, reproducible, and resource-efficient analytical and AI workloads.

Figure: Analytics and Sandbox Technology Stack ⚠️ Diagram to be updated / added

Core Layer

The core layer provides foundational runtime and storage capabilities.

Docker

Provides containerization for packaging applications along with their dependencies
Ensures portability and reproducibility across environments

NVIDIA

Enables GPU acceleration for deep learning and compute-intensive AI workloads
Supports training and inference for advanced ML models

MinIO

Provides S3-compatible object storage
Used for managing:
- Large datasets
- Intermediate processing artefacts
- Model artefacts

Orchestrator Layer

This layer manages containerized workloads and workflow execution.

Kubernetes

Serves as the container orchestration platform
Manages workload scheduling, scaling, and deployment
Enables isolation and resource governance

Argo

Provides workflow orchestration on Kubernetes
Supports execution and management of:
- ETL workflows
- ML pipelines
- Batch and scheduled jobs

Executors Layer

The executors layer provides distributed and scalable compute engines.

PyTorch

A widely used deep learning framework
Supports model training, evaluation, and inference

Dask

A parallel computing library for scaling Python-based data science workloads
Enables distributed processing of large datasets

Ray

A distributed computing framework for scalable ML workloads
Supports:
- Distributed model training
- Hyperparameter tuning
- Large-scale execution

Application Layer

The application layer provides user-facing tools for development and experimentation.

JupyterHub

Enables multi-user interactive notebook environments
Serves as the primary interface for data science and ML development

Kubeflow

Provides a Kubernetes-native ML toolkit
Supports:
- Scalable model training
- Model deployment and serving
- Workflow automation and lifecycle management

Summary

The Analytics and Sandbox technology stack is designed to support advanced AI, ML, and data analytics workflows by ensuring:

Scalability through containerization and orchestration
Reproducibility via standardized runtime environments
Efficient resource utilization using distributed computing frameworks

The combination of Docker, Kubernetes, Argo, Kubeflow, Ray, Dask, PyTorch, and JupyterHub provides a robust and flexible foundation for data-driven applications within the sandbox environment.

PreviousTechnology Stack NextCore DX Technology Stack

Last updated 1 month ago

Good night

hashtagOverview

hashtagCore Layer

hashtagDocker

hashtagNVIDIA

hashtagMinIO

hashtagOrchestrator Layer

hashtagKubernetes

hashtagArgo

hashtagExecutors Layer

hashtagPyTorch

hashtagDask

hashtagRay

hashtagApplication Layer

hashtagJupyterHub

hashtagKubeflow

hashtagSummary