Sandbox Layer

Overview

The Sandbox Layer is a central component of the IUDX-Novo platform, providing users with a controlled, scalable, and secure environment for data science and machine learning activities.

It serves as the primary interaction layer for analysts, data scientists, and researchers, enabling experimentation, model development, and analytical workflows through a notebook-driven interface.

Core Capabilities

The Sandbox Layer enables users to:

  • Spawn data analytics notebooks as the primary mechanism for interactive analysis

  • Load datasets catalogued in the Data Exchange using simple UI utilities within the notebook environment

  • Load models catalogued in the Data Exchange using similar discovery and access mechanisms

  • Interact with the Analytics Layer to perform data processing, experimentation, and model development

  • Create visualizations and dashboards that can be published or shared

  • Use MLOps utilities to:

    • Run training jobs, including long-running workloads

    • Host models for inference applications

Architectural Foundation

img.png

Figure 6: Internal Architecture of the MLOps Platform – Kubeflow ⚠️ Diagram to be updated / added for IUDX-Novo

The Sandbox Layer is built on the open-source MLOps framework Kubeflow, with additional platform-specific integrations.

Core Kubeflow Components Used

The following Kubeflow components are utilized directly within the Sandbox Layer:

Kubeflow Pipelines

  • Enables the definition of ETL pipelines using declarative specifications

  • Pipelines are translated into Argo workflows for execution

Katib

  • Provides built-in hyperparameter tuning capabilities

  • Supports automated experimentation and optimization workflows

KNative

  • Offers Kubernetes-based model hosting and serving

  • Supports scalable and production-ready inference endpoints

Notebooks

  • Implements JupyterHub-based notebook environments

  • Enables interactive analysis and experimentation

Dex

  • Provides internal authentication and authorization for notebook spawning

  • Integrates with platform identity services

Istio

  • Acts as the service and network management layer

  • Supplies a service mesh to manage communication between notebooks and controller components

Platform-Specific Integrations

In addition to native Kubeflow components, the Sandbox Layer includes several integrations to meet IUDX-Novo platform requirements:

  • Identity, authorization, and permissioning integrations between the Data Exchange and Dex

  • Model and dataset discovery and download integrated with the Data Exchange and MinIO

  • Downstream connectivity to data pipelines and analytical workflows

  • Job submission and management integrations with the Analytics Layer

  • Model hosting and inference integrations with the Analytics Layer

  • Compute credit management integrations with the Data Exchange

Scalability and Resource Management

The Sandbox Layer leverages Kubernetes to dynamically provision and manage compute resources, providing the following advantages:

  • On-demand spawning of notebook environments for users

  • Dynamic provisioning of executor clusters such as Ray or Dask

  • Enforcement of user-specific compute restrictions through Kubernetes namespaces

  • Isolation of resource usage and cost across users and workloads

  • Automatic scaling of compute resources based on workload demand and peak usage

  • High availability for all deployed system components

Role in the Platform

  • Acts as the primary user-facing execution environment for analytics and ML

  • Bridges the Data Exchange and Analytics Layer

  • Enables secure, scalable, and governed experimentation and model deployment


Last updated