AI / ML#automl#hyperparameter#optimization
Automl Hyperparameter Optimization
AutoML and hyperparameter optimization rules for Python ML projects using Ray Tune, Optuna, PyCaret, and time-series AutoML libraries
Use it with Cursor, or export as AGENTS.md / CLAUDE.md for other AI coding agents — pick a format below.
# AutoML and Hyperparameter Optimization Rules
## Scope
- Use AutoML to accelerate model exploration, not to bypass problem framing, validation design, or explainability.
- Start with a simple baseline model and fixed metric before launching a search.
- Keep training, evaluation, feature generation, and search configuration separate.
- Record datasets, splits, metric definitions, random seeds, library versions, and search spaces for every run.
## Experiment Design
- Define the target metric before selecting tooling.
- Use nested validation or a final untouched test split for model selection claims.
- Use time-aware splits for time-series problems; never shuffle across time boundaries.
- Prevent leakage by fitting preprocessing only on training folds.
- Include simple baselines such as linear models, random forests, or naive time-series forecasts.
- Use early stopping and resource limits for expensive searches.
- Prefer structured search spaces with domain-informed ranges over arbitrary broad grids.
## Tooling
- Use Ray Tune or Optuna for custom training loops, distributed trials, pruning, and scheduler control.
- Use PyCaret for quick low-code comparisons when the dataset and metric are straightforward.
- Use AutoTS, Merlion, PyAF, or project-approved time-series tooling when forecast-specific validation, seasonality, and horizon handling matter.
- Store run metadata in MLflow, Weights & Biases, TensorBoard, or a project-approved tracker.
- Use `uv` or the existing project package manager for reproducible environments.
## Search Spaces
- Keep search spaces explicit and reviewed.
- Use log-scale sampling for learning rates, regularization, tree counts, and other scale-sensitive values.
- Constrain model complexity to avoid unrealistic training time or memory use.
- Include preprocessing choices only when they can be applied without leakage.
- Do not tune on the test set.
## Reporting
- Report the selected model, metric, confidence interval or variance, validation scheme, and final test result.
- Include the best parameters and the search budget.
- Compare the chosen model against the baseline and at least one non-AutoML alternative.
- Document operational constraints such as inference latency, memory use, retraining cost, and explainability.
## Common Mistakes
- Do not treat leaderboard rank as proof of production readiness.
- Do not mix train/test data during feature engineering.
- Do not run massive searches before validating labels and data quality.
- Do not ignore class imbalance, calibration, or business cost asymmetry.
- Do not deploy an AutoML model without reproducible training code and pinned dependencies.How to use: save the file at your project root (e.g.
.cursorrules or AGENTS.md) and your AI editor picks it up automatically.Related rules
AI / ML→
Cursorrules Cursor AI Next.js 14 Tailwind SEO Setup
Cursor rules for Next.js development with Tailwind CSS and SEO optimization.
#cursor#ai#nextjs#14
AI / ML→
Next.js Supabase Shadcn PWA
Cursor rules for Nextjs Supabase Shadcn Pwa.
#nextjs#supabase#shadcn#pwa
AI / ML→
Next.js Vercel TypeScript
Cursor rules for Next.js development with Vercel and TypeScript integration.
#nextjs#vercel#typescript