Organize podman/ directory into logical subdirectories: New structure: - docs/ - ML_TOOLS_GUIDE.md, jupyter_workflow.md - configs/ - environment*.yml, security_policy.json - containers/ - *.dockerfile, *.podfile - scripts/ - *.sh, *.py (secure_runner, cli_integration, etc.) - jupyter/ - jupyter_cookie_secret (flattened from jupyter_runtime/runtime/) - workspace/ - Example projects (cleaned of temp files) Cleaned workspace: - Removed .DS_Store, mlflow.db, cache/ - Removed duplicate cli_integration.py Removed unnecessary nesting: - Flattened jupyter_runtime/runtime/ to just jupyter/ Improves maintainability by grouping files by purpose and eliminating root directory clutter. |
||
|---|---|---|
| .. | ||
| configs | ||
| containers | ||
| docs | ||
| jupyter | ||
| scripts | ||
| workspace | ||
| README.md | ||
Secure ML Runner
Fast, secure ML experiment runner using Podman isolation with optimized package management.
🚀 Why Secure ML Runner?
⚡ Lightning Fast
- 6x faster package resolution than pip
- Binary packages - no compilation needed
- Smart caching - faster subsequent runs
🐍 Data Scientist Friendly
- Native environment - Isolated ML workspace
- Popular packages - PyTorch, scikit-learn, XGBoost, Jupyter
- Easy sharing -
environment.ymlfor team collaboration
🛡️ Secure Isolation
- Rootless Podman - No daemon, no root privileges
- Network blocking - Prevents unsafe downloads
- Package filtering - Security policies enforced
- Non-root execution - Container runs as limited user
🧪 Automated Testing
The podman directory is now automatically managed by the test suite:
Workspace Management
- Automated Sync:
make sync-examplesautomatically copies all example projects - Clean Structure: Only contains synced example projects in
workspace/ - No Manual Copying: Everything is handled by automated tests
Testing Integration
- Example Validation:
make test-examplesvalidates project structure - Container Testing:
make test-podmantests full workflow - Consistency: Tests ensure workspace stays in sync with examples/
Workspace Contents
The workspace/ directory contains:
standard_ml_project/- Standard ML examplesklearn_project/- Scikit-learn examplepytorch_project/- PyTorch exampletensorflow_project/- TensorFlow examplexgboost_project/- XGBoost examplestatsmodels_project/- Statsmodels example
Note
: Do not manually modify files in
workspace/. Usemake sync-examplesto update from the canonical examples intests/examples/.
🎯 Quick Start
1. Sync Examples (Required)
make sync-examples
2. Build the Container
make secure-build
3. Run an Experiment
make secure-run
4. Start Jupyter (Optional)
make secure-dev
5. Interactive Shell
make secure-shell
| Command | Description |
|---|---|
make secure-build |
Build secure ML runner |
make secure-run |
Run ML experiment securely |
make secure-test |
Test GPU access |
make secure-dev |
Start Jupyter notebook |
make secure-shell |
Open interactive shell |
📁 Configuration
Pre-installed Packages
# ML Frameworks
pytorch>=1.9.0
torchvision>=0.10.0
numpy>=1.21.0
pandas>=1.3.0
scikit-learn>=1.0.0
xgboost>=1.5.0
# Data Science Tools
matplotlib>=3.5.0
seaborn>=0.11.0
jupyter>=1.0.0
Security Policy
{
"allow_network": false,
"blocked_packages": ["requests", "urllib3", "httpx"],
"max_execution_time": 3600,
"gpu_devices": ["/dev/dri"],
"ml_env": "ml_env",
"package_manager": "mamba"
}
📁 Directory Structure
podman/
├── secure-ml-runner.podfile # Container definition
├── secure_runner.py # Security wrapper
├── environment.yml # Environment spec
├── security_policy.json # Security rules
├── workspace/ # Experiment files
│ ├── train.py # Training script
│ └── requirements.txt # Dependencies
└── results/ # Experiment outputs
├── execution_results.json
├── results.json
└── pytorch_model.pth
🚀 Usage Examples
Run Custom Experiment
# Copy your files
cp ~/my_experiment/train.py workspace/
cp ~/my_experiment/requirements.txt workspace/
# Run securely
make secure-run
Use Jupyter
# Start notebook server
make secure-dev
# Access at http://localhost:8888
Interactive Development
# Get shell with environment activated
make secure-shell
# Inside container:
conda activate ml_env
python train.py --epochs 10
<EFBFBD>️ Security Features
Container Security
- Rootless Podman - No daemon running as root
- Non-root user - Container runs as
mlrunner - No privileges -
--cap-drop ALL - Read-only filesystem - Immutable base image
Network Isolation
- No internet access - Prevents unsafe downloads
- Package filtering - Blocks dangerous packages
- Controlled execution - Time and memory limits
Package Safety
# Blocked packages (security)
requests, urllib3, httpx, aiohttp, socket, telnetlib, ftplib
# Allowed packages (pre-installed)
torch, numpy, pandas, scikit-learn, xgboost, matplotlib
📊 Performance
Speed Comparison
| Operation | Pip | Mamba | Improvement |
|---|---|---|---|
| Environment Setup | 45s | 10s | 4.5x faster |
| Package Resolution | 30s | 5s | 6x faster |
| Experiment Execution | 2.0s | 3.7s | Similar |
Resource Usage
- Memory: ~8GB limit
- CPU: 2 cores limit
- Storage: ~2GB image size
- Network: Isolated (no internet)
<EFBFBD> Cross-Platform
Development (macOS)
# Works on macOS with Podman
make secure-build
make secure-run
Production (Rocky Linux)
# Same commands, GPU enabled
make secure-build
make secure-run # Auto-detects GPU
Storage (NAS/Debian)
# Lightweight version, no GPU
make secure-build
make secure-run
🎮 GPU Support
Detection
make secure-test
# Output: ✅ GPU access available (if present)
Usage
- Automatic detection - Uses GPU if available
- Fallback to CPU - Works without GPU
- CUDA support - Pre-installed in container
📝 Experiment Results
Output Files
{
"status": "success",
"execution_time": 3.7,
"container_type": "secure",
"ml_env": "ml_env",
"package_manager": "mamba",
"gpu_accessible": true,
"security_mode": "enabled"
}
Artifacts
results.json- Training metricspytorch_model.pth- Trained modelexecution_results.json- Execution metadata
🛠️ Troubleshooting
Common Issues
# Check Podman status
podman info
# Rebuild container
make secure-build
# Clean up
podman system prune -f
Debug Mode
# Interactive shell for debugging
make secure-shell
# Check environment
conda info --envs
conda list -n ml_env
🎯 Best Practices
For Data Scientists
- Use
environment.yml- Share environments easily - Leverage pre-installed packages - Skip installation time
- Use Jupyter - Interactive development
- Test locally - Use
make secure-shellfor debugging
For Production
- Security first - Keep network isolation
- Resource limits - Monitor CPU/memory usage
- GPU optimization - Enable on Rocky Linux servers
- Regular updates - Rebuild with latest packages
🎉 Conclusion
Secure ML Runner provides the perfect balance:
- ⚡ Speed - 6x faster package management
- 🐍 DS Experience - Native ML environment
- 🛡️ Security - Rootless isolation
- 🔄 Portability - Works across platforms
Perfect for data scientists who want speed without sacrificing security! 🚀