[Preview] Amazon SageMaker Unified Studio CI/CD CLI is currently in preview and is subject to change. Commands, configuration formats, and APIs may evolve based on customer feedback. We recommend evaluating this tool in non-production environments during preview. For feedback and bug reports, please open an issue https://github.com/aws/CICD-for-SageMakerUnifiedStudio/issues
[IAM + IdC Domains] This CLI supports both IAM-based and IAM Identity Center (IdC)-based SMUS domains. For IdC domains, additional setup (VPC networking, Lake Formation permissions, inline IAM policies) may be required — see the setup scripts in each example directory.
Automate deployment of data applications across SageMaker Unified Studio environments
Deploy Airflow DAGs, Jupyter notebooks, and ML workflows from development to production with confidence. Built for data scientists, data engineers, ML engineers, and GenAI app developers working with DevOps teams.
Works with your deployment strategy: Whether you use git branches (branch-based), versioned artifacts (bundle-based), git tags (tag-based), or direct deployment - this CLI supports your workflow. Define your application once, deploy it your way.
Why SMUS CI/CD CLI?
✅ AWS Abstraction Layer - CLI encapsulates all AWS analytics, ML, and SMUS complexity - DevOps teams never call AWS APIs directly ✅ Separation of Concerns - Data teams define WHAT to deploy (manifest.yaml), DevOps teams define HOW and WHEN (CI/CD workflows) ✅ Generic CI/CD Workflows - Same workflow works for Glue, SageMaker, Bedrock, QuickSight, or any AWS service combination ✅ Deploy with Confidence - Pre-deployment dry-run validation and automated testing before production ✅ Multi-Environment Management - Test → Prod with environment-specific configuration ✅ Infrastructure as Code - Version-controlled application manifests and reproducible deployments ✅ Event-Driven Workflows - Trigger workflows automatically via EventBridge on deployment
Quick Start
Install:
pip install aws-smus-cicd-cli
Deploy your first application:
# Validate configuration
aws-smus-cicd-cli describe --manifest manifest.yaml --connect
# Create deployment bundle (optional)
aws-smus-cicd-cli bundle --manifest manifest.yaml
# Preview deployment (dry run)
aws-smus-cicd-cli deploy --targets test --manifest manifest.yaml --dry-run
# Deploy to test environment
aws-smus-cicd-cli deploy --targets test --manifest manifest.yaml
# Run validation tests
aws-smus-cicd-cli test --manifest manifest.yaml --targets test
👨💻 Data Teams (Data Scientists, Data Engineers, GenAI App Developers)
You focus on: Your application - what to deploy, where to deploy, and how it runs You define: Application manifest (manifest.yaml) with your code, workflows, and configurations You don’t need to know: CI/CD pipelines, GitHub Actions, deployment automation
You focus on: CI/CD best practices, security, compliance, and deployment automation You define: Workflow templates that enforce testing, approvals, and promotion policies You don’t need to know: Application-specific details, AWS services used, DataZone APIs, SMUS project structures, or business logic
→ Admin Guide - Configure infrastructure and pipelines in 15 minutes → GitHub Workflow Templates - Generic, reusable workflow templates for automated deployment
The CLI is your abstraction layer: You just call aws-smus-cicd-cli deploy - the CLI handles all AWS service interactions (DataZone, Glue, Athena, SageMaker, MWAA, S3, IAM, etc.). Your workflows stay simple and generic.
The Problem: Traditional deployment approaches force DevOps teams to learn AWS analytics services (Glue, Athena, DataZone, SageMaker, MWAA, etc.) and understand SMUS project structures, or force data teams to become CI/CD experts.
The Solution: SMUS CI/CD CLI is the abstraction layer that encapsulates all AWS and SMUS complexity.
Example workflow:
1. DevOps Team 2. Data Team 3. SMUS CI/CD CLI (The Abstraction)
↓ ↓ ↓
Defines the PROCESS Defines the CONTENT Workflow calls:
- Test on merge - Glue jobs aws-smus-cicd-cli deploy --manifest manifest.yaml
- Approval for prod - SageMaker training ↓
- Security scans - Athena queries CLI handles ALL AWS complexity:
- Notification rules - File structure - DataZone APIs
- Glue/Athena/SageMaker APIs
Defines INFRASTRUCTURE - MWAA deployment
- Account & region - S3 management
- IAM roles - IAM configuration
- Resources - Infrastructure provisioning
Works for ANY app!
No ML/Analytics/GenAI
service knowledge needed!
DevOps teams focus on:
CI/CD best practices (testing, approvals, notifications)
Security and compliance gates
Deployment orchestration
Monitoring and alerting
SMUS CI/CD CLI handles ALL AWS complexity:
DataZone domain and project management
AWS Glue, Athena, SageMaker, MWAA APIs
S3 storage and artifact management
IAM roles and permissions
Connection configurations
Catalog asset subscriptions
Workflow deployment to Airflow
Infrastructure provisioning
Testing and validation
Data teams focus on:
Application code and workflows
Which AWS services to use (Glue, Athena, SageMaker, etc.)
Environment configurations
Business logic
Result:
DevOps teams never call AWS APIs directly - they just call aws-smus-cicd-cli deploy
CI/CD workflows are generic - same workflow works for Glue apps, SageMaker apps, or Bedrock apps
Data teams never touch CI/CD configs
Both teams work independently using their expertise
Application Manifest
A declarative YAML file (manifest.yaml) that defines your data application:
Application details - Name, version, description
Content - Code from git repositories, data/models from storage, QuickSight dashboards
Workflows - Airflow DAGs for orchestration and automation
Stages - Where to deploy (dev, test, prod environments)
Configuration - Environment-specific settings, connections, and bootstrap actions
Created and owned by data teams. Defines what to deploy and where. No CI/CD knowledge required.
Application
Your data/analytics workload being deployed:
Airflow DAGs and Python scripts
Jupyter notebooks and data files
ML models and training code
ETL pipelines and transformations
GenAI agents and MCP servers
Foundation model configurations
Stage
A deployment environment (dev, test, prod) mapped to a SageMaker Unified Studio project:
Domain and region configuration
Project name and settings
Resource connections (S3, Airflow, Athena, Glue)
Environment-specific parameters
Optional branch mapping for git-based deployments
Stage-to-Project Mapping
Each application stage deploys to a dedicated SageMaker Unified Studio (SMUS) project. A project can host a single application or multiple applications depending on your architecture and CI/CD methodology. Stage projects are independent entities with their own governance:
Ownership & Access: Each stage project has its own set of owners and contributors, which may differ from the development project. Production projects typically have restricted access compared to development environments.
Multi-Domain & Multi-Region: Stage projects can belong to different SMUS domains, AWS accounts, and regions. For example, your dev stage might deploy to a development domain in us-east-1, while prod deploys to a production domain in eu-west-1.
Flexible Architecture: Organizations can choose between dedicated projects per application (isolation) or shared projects hosting multiple applications (consolidation), based on security, compliance, and operational requirements.
This separation enables true environment isolation with independent access controls, compliance boundaries, and regional data residency requirements.
Workflow
Orchestration logic that executes your application. Workflows serve two purposes:
1. Deployment-time: Create required AWS resources during deployment
Provision infrastructure (S3 buckets, databases, IAM roles)
GitHub Actions workflows (or other CI/CD systems) that automate deployment:
Created and owned by DevOps teams
Defines how and when to deploy
Runs tests and quality gates
Manages promotion across targets
Enforces security and compliance policies
Example: .github/workflows/deploy.yml
Key insight: DevOps teams create generic, reusable workflows that work for ANY application. They don’t need to know if the app uses Glue, SageMaker, or Bedrock - the CLI handles all AWS service interactions. The workflow just calls aws-smus-cicd-cli deploy and the CLI does the rest.
Deployment Modes
Bundle-based (Artifact): Create versioned archive → deploy archive to stages
Good for: audit trails, rollback capability, compliance
Command: aws-smus-cicd-cli bundle then aws-smus-cicd-cli deploy --manifest app.tar.gz
Direct (Git-based): Deploy directly from sources without intermediate artifacts
Good for: simpler workflows, rapid iteration, git as source of truth
Command: aws-smus-cicd-cli deploy --manifest manifest.yaml --targets test
Both modes work with any combination of storage and git content sources.
Example Applications
Real-world examples showing how to deploy different workloads with SMUS CI/CD.
📊 Analytics - QuickSight Dashboard
Deploy interactive BI dashboards with automated Glue ETL pipelines for data preparation. Uses QuickSight asset bundles, Athena queries, and GitHub dataset integration with environment-specific configurations.
What happens during deployment: Application code is deployed to S3, Glue jobs and Airflow workflows are created and executed, QuickSight dashboard/data source/dataset are created, and QuickSight ingestion is initiated to refresh the dashboard with latest data.
Deploy Jupyter notebooks with parallel execution orchestration for data analysis and ETL workflows. Demonstrates notebook deployment with MLflow integration for experiment tracking.
What happens during deployment: Notebooks and workflow definitions are uploaded to S3, Airflow DAG is created for parallel notebook execution, MLflow connection is provisioned for experiment tracking, and notebooks are ready to run on-demand or scheduled.
Train ML models with SageMaker using the SageMaker SDK and SageMaker Distribution images. Track experiments with MLflow and automate training pipelines with environment-specific configurations.
What happens during deployment: Training code and workflow definitions are uploaded to S3 with compression, Airflow DAG is created for training orchestration, MLflow connection is provisioned for experiment tracking, and SageMaker training jobs are created and executed using SageMaker Distribution images.
Deploy trained ML models as SageMaker real-time inference endpoints. Uses SageMaker SDK for endpoint configuration and SageMaker Distribution images for serving.
What happens during deployment: Model artifacts, deployment code, and workflow definitions are uploaded to S3, Airflow DAG is created for endpoint deployment orchestration, SageMaker endpoint configuration and model are created, and the inference endpoint is deployed and ready to serve predictions.
What happens during deployment: Agent configuration and workflow definitions are uploaded to S3, Airflow DAG is created for agent deployment orchestration, Bedrock agents and knowledge bases are configured, and the GenAI application is ready for inference and testing.
The examples above support both IAM-based and IAM Identity Center (IdC)-based domains. IdC domains require additional one-time setup due to VpcOnly networking and tag-based IAM policies. Each example includes a setup script:
MLflow tracking server access, CloudWatch Logs permissions
ML Deployment
Uses the same project role as ML Training
No additional setup beyond ML Training
# Run setup for data-notebooks (IdC domain)
TEST_DOMAIN_REGION=us-east-1 python examples/analytic-workflow/data-notebooks/idc_domain_project_setup.py
# Run setup for ML training (IdC domain)
TEST_DOMAIN_REGION=us-east-1 python examples/analytic-workflow/ml/training/idc_domain_project_setup.py
# Dry run to preview changes
python examples/analytic-workflow/data-notebooks/idc_domain_project_setup.py --dry-run
All setup scripts are idempotent and safe to run multiple times. Use --dry-run to preview changes before applying.
Automated Deployment - Define your application content, workflows, and deployment targets in YAML. Bundle-based (artifact) or direct (git-based) deployment modes. Deploy to test and prod with a single command. Dynamic configuration using ${VAR} substitution. Track deployments in S3 or git for deployment history.
Environment Variables & Dynamic Configuration - Flexible configuration for any environment using variable substitution. Environment-specific settings with validation and connection management.
Automated Workflow Execution & Event-Driven Workflows - Trigger workflows automatically during deployment with workflow.run (use trailLogs: true to stream logs and wait for completion). Fetch workflow logs for validation and debugging with workflow.logs. Automatically refresh QuickSight dashboards after ETL deployment with quicksight.refresh_dataset. Emit custom events for downstream automation and CI/CD orchestration with eventbridge.put_events. Provision MLflow and other DataZone connections during deployment. Actions run in order during aws-smus-cicd-cli deploy for reliable initialization and validation.
Pre-built CI/CD Pipeline Workflows - GitHub Actions, GitLab CI, Azure DevOps, and Jenkins support for automated deployment. Flexible configuration for any CI/CD platform. Trigger deployments from external events with webhook support.
Automated Tests & Quality Gates - Run validation tests before promoting to production. Block deployments if tests fail. Track execution status and logs. Verify deployment correctness with health checks.
SMUS CI/CD Pipeline CLI
Automate deployment of data applications across SageMaker Unified Studio environments
Deploy Airflow DAGs, Jupyter notebooks, and ML workflows from development to production with confidence. Built for data scientists, data engineers, ML engineers, and GenAI app developers working with DevOps teams.
Works with your deployment strategy: Whether you use git branches (branch-based), versioned artifacts (bundle-based), git tags (tag-based), or direct deployment - this CLI supports your workflow. Define your application once, deploy it your way.
Why SMUS CI/CD CLI?
✅ AWS Abstraction Layer - CLI encapsulates all AWS analytics, ML, and SMUS complexity - DevOps teams never call AWS APIs directly
✅ Separation of Concerns - Data teams define WHAT to deploy (manifest.yaml), DevOps teams define HOW and WHEN (CI/CD workflows)
✅ Generic CI/CD Workflows - Same workflow works for Glue, SageMaker, Bedrock, QuickSight, or any AWS service combination
✅ Deploy with Confidence - Pre-deployment dry-run validation and automated testing before production
✅ Multi-Environment Management - Test → Prod with environment-specific configuration
✅ Infrastructure as Code - Version-controlled application manifests and reproducible deployments
✅ Event-Driven Workflows - Trigger workflows automatically via EventBridge on deployment
Quick Start
Install:
Deploy your first application:
See it in action: Live GitHub Actions Example
Who Is This For?
👨💻 Data Teams (Data Scientists, Data Engineers, GenAI App Developers)
You focus on: Your application - what to deploy, where to deploy, and how it runs
You define: Application manifest (
manifest.yaml) with your code, workflows, and configurationsYou don’t need to know: CI/CD pipelines, GitHub Actions, deployment automation
→ Quick Start Guide - Deploy your first application in 10 minutes
Includes examples for:
🔧 DevOps Teams
You focus on: CI/CD best practices, security, compliance, and deployment automation
You define: Workflow templates that enforce testing, approvals, and promotion policies
You don’t need to know: Application-specific details, AWS services used, DataZone APIs, SMUS project structures, or business logic
→ Admin Guide - Configure infrastructure and pipelines in 15 minutes
→ GitHub Workflow Templates - Generic, reusable workflow templates for automated deployment
The CLI is your abstraction layer: You just call
aws-smus-cicd-cli deploy- the CLI handles all AWS service interactions (DataZone, Glue, Athena, SageMaker, MWAA, S3, IAM, etc.). Your workflows stay simple and generic.What Can You Deploy?
📊 Analytics & BI
🤖 Machine Learning
🧠 Generative AI
📓 Code & Workflows
💾 Data & Storage
Supported AWS Services
Deploy workflows using these AWS services through Airflow YAML syntax:
🎯 Analytics & Data
Amazon Athena • AWS Glue • Amazon EMR • Amazon Redshift • Amazon QuickSight • Lake Formation
🤖 Machine Learning
SageMaker Training • SageMaker Pipelines • Feature Store • Model Registry • Batch Transform
🧠 Generative AI
Amazon Bedrock • Bedrock Agents • Bedrock Knowledge Bases • Guardrails
📊 Additional Services
S3 • Lambda • Step Functions • DynamoDB • RDS • SNS/SQS • Batch
See complete list: Airflow AWS Operators Reference
Core Concepts
Separation of Concerns: The Key Design Principle
The Problem: Traditional deployment approaches force DevOps teams to learn AWS analytics services (Glue, Athena, DataZone, SageMaker, MWAA, etc.) and understand SMUS project structures, or force data teams to become CI/CD experts.
The Solution: SMUS CI/CD CLI is the abstraction layer that encapsulates all AWS and SMUS complexity.
Example workflow:
DevOps teams focus on:
SMUS CI/CD CLI handles ALL AWS complexity:
Data teams focus on:
Result:
aws-smus-cicd-cli deployApplication Manifest
A declarative YAML file (
manifest.yaml) that defines your data application:Created and owned by data teams. Defines what to deploy and where. No CI/CD knowledge required.
Application
Your data/analytics workload being deployed:
Stage
A deployment environment (dev, test, prod) mapped to a SageMaker Unified Studio project:
Stage-to-Project Mapping
Each application stage deploys to a dedicated SageMaker Unified Studio (SMUS) project. A project can host a single application or multiple applications depending on your architecture and CI/CD methodology. Stage projects are independent entities with their own governance:
This separation enables true environment isolation with independent access controls, compliance boundaries, and regional data residency requirements.
Workflow
Orchestration logic that executes your application. Workflows serve two purposes:
1. Deployment-time: Create required AWS resources during deployment
2. Runtime: Execute ongoing data and ML pipelines
Workflows are defined as Airflow DAGs (Directed Acyclic Graphs) in YAML format. Supports MWAA (Managed Workflows for Apache Airflow) and Amazon MWAA Serverless (User Guide).
CI/CD Automation
GitHub Actions workflows (or other CI/CD systems) that automate deployment:
.github/workflows/deploy.ymlKey insight: DevOps teams create generic, reusable workflows that work for ANY application. They don’t need to know if the app uses Glue, SageMaker, or Bedrock - the CLI handles all AWS service interactions. The workflow just calls
aws-smus-cicd-cli deployand the CLI does the rest.Deployment Modes
Bundle-based (Artifact): Create versioned archive → deploy archive to stages
aws-smus-cicd-cli bundlethenaws-smus-cicd-cli deploy --manifest app.tar.gzDirect (Git-based): Deploy directly from sources without intermediate artifacts
aws-smus-cicd-cli deploy --manifest manifest.yaml --targets testBoth modes work with any combination of storage and git content sources.
Example Applications
Real-world examples showing how to deploy different workloads with SMUS CI/CD.
📊 Analytics - QuickSight Dashboard
Deploy interactive BI dashboards with automated Glue ETL pipelines for data preparation. Uses QuickSight asset bundles, Athena queries, and GitHub dataset integration with environment-specific configurations.
AWS Services: QuickSight • Glue • Athena • S3 • MWAA Serverless
GitHub Workflow: analytic-dashboard-glue-quicksight.yml
What happens during deployment: Application code is deployed to S3, Glue jobs and Airflow workflows are created and executed, QuickSight dashboard/data source/dataset are created, and QuickSight ingestion is initiated to refresh the dashboard with latest data.
📁 App Structure
Key Files:
View Airflow Workflow
View Manifest
View Full Example →
📓 Data Engineering - Notebooks
Deploy Jupyter notebooks with parallel execution orchestration for data analysis and ETL workflows. Demonstrates notebook deployment with MLflow integration for experiment tracking.
AWS Services: SageMaker Notebooks • MLflow • S3 • MWAA Serverless
GitHub Workflow: analytic-data-notebooks.yml
What happens during deployment: Notebooks and workflow definitions are uploaded to S3, Airflow DAG is created for parallel notebook execution, MLflow connection is provisioned for experiment tracking, and notebooks are ready to run on-demand or scheduled.
📁 App Structure
Key Files:
View Manifest
View Airflow Workflow
View Full Example →
🤖 Machine Learning - Training
Train ML models with SageMaker using the SageMaker SDK and SageMaker Distribution images. Track experiments with MLflow and automate training pipelines with environment-specific configurations.
AWS Services: SageMaker Training • MLflow • S3 • MWAA Serverless
GitHub Workflow: analytic-ml-training.yml
What happens during deployment: Training code and workflow definitions are uploaded to S3 with compression, Airflow DAG is created for training orchestration, MLflow connection is provisioned for experiment tracking, and SageMaker training jobs are created and executed using SageMaker Distribution images.
📁 App Structure
Key Files:
View Manifest
View Airflow Workflow
View Full Example →
🤖 Machine Learning - Deployment
Deploy trained ML models as SageMaker real-time inference endpoints. Uses SageMaker SDK for endpoint configuration and SageMaker Distribution images for serving.
AWS Services: SageMaker Endpoints • S3 • MWAA Serverless
GitHub Workflow: analytic-ml-deployment.yml
What happens during deployment: Model artifacts, deployment code, and workflow definitions are uploaded to S3, Airflow DAG is created for endpoint deployment orchestration, SageMaker endpoint configuration and model are created, and the inference endpoint is deployed and ready to serve predictions.
📁 App Structure
Key Files:
View Manifest
View Airflow Workflow
View Full Example →
🧠 Generative AI
Deploy GenAI applications with Bedrock agents and knowledge bases. Demonstrates RAG (Retrieval Augmented Generation) workflows with automated agent deployment and testing.
AWS Services: Amazon Bedrock • S3 • MWAA Serverless
GitHub Workflow: analytic-genai-workflow.yml
What happens during deployment: Agent configuration and workflow definitions are uploaded to S3, Airflow DAG is created for agent deployment orchestration, Bedrock agents and knowledge bases are configured, and the GenAI application is ready for inference and testing.
📁 App Structure
Key Files:
View Manifest
View Airflow Workflow
View Full Example →
See All Examples with Detailed Walkthroughs →
🔐 IdC Domain Setup
The examples above support both IAM-based and IAM Identity Center (IdC)-based domains. IdC domains require additional one-time setup due to VpcOnly networking and tag-based IAM policies. Each example includes a setup script:
idc_domain_project_setup.pysagemaker_sample_dbidc_domain_project_setup.pyAll setup scripts are idempotent and safe to run multiple times. Use
--dry-runto preview changes before applying.📋 Feature Checklist
Legend: ✅ Supported | 🔄 Planned | 🔮 Future
Core Infrastructure
Deployment & Bundling
Automated Deployment - Define your application content, workflows, and deployment targets in YAML. Bundle-based (artifact) or direct (git-based) deployment modes. Deploy to test and prod with a single command. Dynamic configuration using
${VAR}substitution. Track deployments in S3 or git for deployment history.Developer Experience
aws-smus-cicd-cli initwith templatesConfiguration
Environment Variables & Dynamic Configuration - Flexible configuration for any environment using variable substitution. Environment-specific settings with validation and connection management.
Resources & Workloads
Deploy Any AWS Service - Airflow DAGs, Jupyter notebooks, Glue ETL jobs, Athena queries, SageMaker training and endpoints, QuickSight dashboards, Bedrock agents, Lambda functions, EMR jobs, and Redshift queries.
Bootstrap Actions
Automated Workflow Execution & Event-Driven Workflows - Trigger workflows automatically during deployment with
workflow.run(usetrailLogs: trueto stream logs and wait for completion). Fetch workflow logs for validation and debugging withworkflow.logs. Automatically refresh QuickSight dashboards after ETL deployment withquicksight.refresh_dataset. Emit custom events for downstream automation and CI/CD orchestration witheventbridge.put_events. Provision MLflow and other DataZone connections during deployment. Actions run in order duringaws-smus-cicd-cli deployfor reliable initialization and validation.CI/CD Integration
Pre-built CI/CD Pipeline Workflows - GitHub Actions, GitLab CI, Azure DevOps, and Jenkins support for automated deployment. Flexible configuration for any CI/CD platform. Trigger deployments from external events with webhook support.
Testing & Validation
Automated Tests & Quality Gates - Run validation tests before promoting to production. Block deployments if tests fail. Track execution status and logs. Verify deployment correctness with health checks.
Monitoring & Observability
AWS Service Integration
Advanced Features
Documentation
Getting Started
Guides
Reference
Examples
Development
Support
Security Notice
Always install from the official AWS PyPI package or source code.
License
This project is licensed under the MIT-0 License. See LICENSE for details.
Scan QR code to view this README on GitHub