Databricks MLOps Playbook: From MLflow to Production

This playbook distills a pragmatic MLOps path on Databricks: from data prep to robust deployment with guardrails. Why another MLOps guide? Focus on operational reality: lineage, reproducibility, cost/latency, and KPI reliability. Re-usable patterns you can drop into teams without heavy ceremony. Reference architecture flowchart TD A["Ingest: Batch/Streaming"] --> B["Bronze Delta"] B --> C["Curate: Features"] C --> D["`ML Training MLflow tracking`"] D --> E["`Registry Stages: Staging/Prod`"] E --> F["Serving/Batch Scoring"] F --> G["`Monitoring Drift, KPI, Cost`"] Building blocks Delta Lake: schema evolution, Z-order, OPTIMIZE + VACUUM policies. MLflow: experiment tracking, model registry, stage transitions with approvals. CI/CD: notebooks/jobs packaged via repo; tests for data contracts and model code. Observability: input DQ, feature coverage, drift monitors, KPI windows, cost budgets. Sample: register and deploy import mlflow from mlflow.tracking import MlflowClient run_id = mlflow.active_run().info.run_id mlflow.sklearn.log_model(model, "model") client = MlflowClient() model_uri = f"runs:/{run_id}/model" client.create_registered_model("churn_model") client.create_model_version("churn_model", model_uri, run_id) client.transition_model_version_stage("churn_model", 1, stage="Staging") Guardrails Promotion requires DQ + performance gates; auto-revert on KPI regression. Cost envelopes by job cluster policy; latency SLOs per endpoint. Takeaways Ship small, measurable increments; automate checks; keep lineage and docs close to the code.

2025-09-24 · 1 min · rokorolev

Pre‑sales Patterns for Databricks Solutions Engineers

A compact playbook of motions and artifacts that convert technical credibility into business outcomes. Core motions Discovery → value mapping → success criteria → lightweight PoV plan. Hands-on demo/POC: data onboarding, Delta reliability, MLflow, governance. Objection handling: performance, cost, migration, security/governance. Workshop kit (reuse) Reference architectures (ingest, Lakehouse, MLOps, RAG) and checklists. Demo datasets and notebooks; pre-wired CI/CD and QA gates. Post‑POC artifacts: runbooks, sizing, adoption roadmap, risk register. Example PoV success criteria Time‑to‑first Delta ingestion and DQ %. MLflow registered model with promotion gates. RAG: Recall@k and latency thresholds; data governance and PII controls. SE hygiene Record assumptions, map stakeholders, and publish decisions. Measure outcomes; close gaps with targeted enablement.

2025-09-24 · 1 min · rokorolev

RAG on Databricks: Embeddings, Vector Search, and Cost/Latency Tuning

A practical guide to building Retrieval-Augmented Generation on Databricks. Pipeline overview flowchart LR A[Docs/Transcripts] --> B[Chunk & Clean] B --> C["Embed GTE Large EN v1.5"] C --> D["Vector Index Vector Search"] D --> E["Retrieve"] E --> F["Compose Prompt"] F --> G["LLM Inference Hybrid"] G --> H["Post-process Policy/PII"] Key choices Chunking: semantic + overlap; store offsets for citations. Embeddings: GTE Large EN v1.5; evaluate coverage vs latency. Index: Delta-backed vector search; freshness vs cost trade-offs. Inference: hybrid (open + hosted) to balance latency and accuracy. Example: embed and upsert from databricks.vector_search.client import VectorSearchClient vsc = VectorSearchClient() index = vsc.get_index("main", "transcripts_idx") index.upsert([ {"id": "doc:123#p5", "text": "...", "metadata": {"source": "call"}} ]) Evaluation & guardrails Offline: Recall@k, response faithfulness, toxicity/policy checks. Online: user feedback, fallback/abstain behavior. Cost/latency tips Batch embeddings; cache frequent queries; keep vector dim reasonable. Monitor token usage; pre-validate prompts; route by difficulty.

2025-09-24 · 1 min · rokorolev

SafetyCultureToDatabricks

This project is a data integration tool that collects information from the SafetyCulture API v1.0 and inserts it into Databricks Delta tables. It is designed to automate the extraction, transformation, and loading (ETL) of SafetyCulture data for analytics and reporting in Databricks. Project Link: https://rokorolev.gitlab.io/safety-culture-to-databricks/

2025-09-04 · 1 min · rokorolev

Sparqlin

sparqlin is a Spark SQL framework designed to simplify job creation and management in Databricks environments. It integrates with Spark SQL and PySpark for a streamlined development experience. The framework was specifically created to empower data analysts who may not have deep development skills. It provides a streamlined approach to adopting standard software development life cycles, enabling analysts to focus on working with data without the need to master complex programming paradigms. By leveraging familiar tools like SQL scripts and YAML files, the framework simplifies tasks such as data configuration, transformation, and testing. ...

2025-09-04 · 1 min · rokorolev

TableauUsageToDatabricks

TableauUsageToDatabricks is a .NET application designed to extract Tableau usage data and upload it to Databricks in a structured format. It parses Tableau XML and JSON files, transforms them into models, and writes the results as Parquet files for analytics and reporting in Databricks. Project Link: https://rokorolev.gitlab.io/tableau-usage-to-databricks/

2025-09-04 · 1 min · rokorolev