The primary service you would use for periodic, automated execution of an ML workflow within the SageMaker ecosystem is SageMaker Pipelines or, for simpler scheduling, a combination of Amazon EventBridge (the scheduler) and SageMaker Processing Jobs or Training Jobs (the executor).
Here is a breakdown of the three best ways to achieve this, from simplest to most robust:
1. ⚙️ Simplest Approach: EventBridge + Training/Processing Jobs
This is the easiest way to schedule a single script execution.
- SageMaker Training Job: Used if your Python script’s primary function is to train an ML model (i.e., it defines a model, loads data, and runs a fitting process).
- SageMaker Processing Job: Used if your Python script’s primary function is data preparation, feature engineering, or model evaluation (tasks that don’t involve training/fitting a model).
- Amazon EventBridge (formerly CloudWatch Events): This acts as the scheduler.
How it works:
- Package Your Script: Ensure your notebook’s Python code is saved as a clean Python script (
.pyfile). - Upload to S3: Upload your script and any required data to an Amazon S3 bucket.
- Create an EventBridge Rule:
- Set the Schedule: Use a cron expression (e.g.,
cron(0 12 * * ? *)for noon UTC daily) or a fixed rate (e.g.,rate(1 day)). - Set the Target: Point the rule to invoke the SageMaker service, specifically starting a Training Job or Processing Job.
- Pass Parameters: In the EventBridge rule’s input transformer, you specify the necessary parameters for the SageMaker job, such as the path to your script in S3, the instance type, and the output location.
- Set the Schedule: Use a cron expression (e.g.,
2. 🧱 Best for MLOps: SageMaker Pipelines
For full MLOps automation, repeatability, and tracking, SageMaker Pipelines is the recommended service. It allows you to define a multi-step workflow.
How it works:
- Define the Pipeline: You define your workflow (e.g., Data Prep $\rightarrow$ Training $\rightarrow$ Model Evaluation $\rightarrow$ Conditional Deployment) programmatically using the SageMaker Python SDK. Each step is a SageMaker construct (e.g., a
ProcessingStepfor data prep, aTrainingStepfor model training). - Upload the Definition: The compiled pipeline definition is uploaded to SageMaker.
- Schedule the Pipeline: You use the exact same scheduling method as above: Amazon EventBridge.
- Set the Schedule (cron/rate).
- Set the Target: Point the rule to the SageMaker service and specify the action to StartPipelineExecution for your defined pipeline.
Benefit: This provides a repeatable, traceable workflow where you can track artifacts (models, datasets) and easily revert to previous runs.
3. 🌐 Alternative: Lambda + SageMaker
You can use a simple AWS Lambda function as an intermediary to give you more control.
How it works:
- Schedule Lambda: Use Amazon EventBridge to schedule the AWS Lambda function periodically.
- Lambda’s Role: The Lambda function, written in Python, uses the AWS SDK (Boto3) to programmatically call the SageMaker API (
sagemaker.create_training_job()orsagemaker.create_processing_job()). - Start Job: The Lambda execution triggers the creation and start of the desired SageMaker job.
Benefit: This gives you maximum flexibility to perform pre-run checks, dynamic parameter selection, or complex logging before launching the SageMaker job.

