AWS SageMaker Step Functions Lambda Python 3.10 βœ“ Production Ready

Data Labeling & Preprocessing Pipeline

A serverless ML pipeline that automates image labeling with SageMaker Ground Truth and preprocessing with AWS Lambda, orchestrated by Step Functions.

πŸ–ΌοΈ
30
Images Processed
🏷️
3
Label Classes
⚑
<2s
Execution Time
βœ…
SUCCESS
Pipeline Status
1

Pipeline Architecture

βœ“
πŸ“
Raw Images
S3 Bucket
β†’β†’β†’
βœ“
🏷️
Ground Truth
SageMaker
β†’β†’β†’
βœ“
⚑
Step Functions
Orchestration
β†’β†’β†’
βœ“
πŸ”§
Lambda
Preprocessing
β†’β†’β†’
βœ“
πŸ“Š
ML-Ready Data
S3 Output
2

AWS Resources

πŸͺ£
S3 Buckets
Data Storage
Raw Data ml-labeling-raw-ap-south-1
Labeled ml-labeling-labeled-ap-south-1
Processed ml-labeling-processed-ap-south-1
Ξ»
Lambda Function
Serverless Compute
Name preprocess-labeled-data
Runtime Python 3.10
Memory 256 MB
⚑
Step Functions
Orchestration
Name labeling-preprocessing-pipeline
Type Standard
Status ACTIVE
🧠
Ground Truth Job
SageMaker Labeling
Job Name image-classification-job-v1
Labels Cat, Dog, Bird
Status COMPLETED
3

Data Transformation

πŸ“₯ Input Manifest
{
  "source-ref": "s3://...dog1.jpg"
}
🏷️ Ground Truth Output
{
  "source-ref": "s3://...dog1.jpg",
  "image-classification-job-v1": 1,
  "...-metadata": {
    "class-name": "Dog"
  }
}
βœ… Processed Output
{
  "image": "s3://...dog1.jpg",
  "label": "Dog"
}
4

Execution Timeline

1
Images Uploaded to S3 COMPLETED
30 JPG images (cats, dogs, birds) uploaded to raw bucket
2
Ground Truth Labeling 30/30 LABELED
Private workforce labeled all images with Cat/Dog/Bird classes
3
Step Functions Triggered SUCCEEDED
State machine invoked Lambda preprocessing function
4
Lambda Preprocessing <1 SECOND
Extracted labels, created cleaned manifest for ML training
βœ“
Pipeline Complete SUCCESS
processed.manifest ready for ML model training