Curriculum Vitae
Strategic data/ML leader and hands-on engineer with end-to-end impact across telecommunications, logistics & supply chain, energy, marketing, and finance. I build scalable data platforms (Spark/Delta), production ML (MLflow/feature curation), and reliable KPI ecosystems—while leading analysts/engineers through clear contracts, governance, and automation.
Career Snapshot
- 2025–Present: Analytics Lead (Optus) — CNPS improvement, Responsible Sales analytics, ML/LLM in production; data governance, PII redaction (Spark + Presidio), and BI scale (~700 views, ~300 workbooks).
- 2024–2025: Senior Data Engineer (Optus) — Databricks patterns (Spark/Delta, dbt, MLflow), retention/masking, quality gates, and CI/CD foundations.
- 2018–2024: WTG — Data Analyst → Senior Data Scientist (de‑facto lead); built ETL for Market Intelligence and Analytics; streaming (Kafka + Delta), deep‑learning NLP (PyTorch/Keras, Transformers), .NET backend/orchestrator (ASP.NET + EF + SQL Server), S3/Ceph.
- 2019–2020: Data Engineer (Dentsu) — MTA (Markov/MCMC), funnel & churn analytics; Databricks (Spark) + BigQuery; segmentation; presented technical architecture to Energy client.
- 2015–2018: Head of Analytics (Oil & Gas) — probabilistic modeling, portfolio risk, analytics organization leadership.
- 2001–2015: Finance & Risk — FX dealing, quant modeling, SDE numerics, Monte Carlo; academic (full / part‑time) in probability/statistics.
Core Competencies
Domain | Skills |
---|---|
Data Engineering | Spark (Scala/PySpark), Databricks, Delta Lake; streaming with Kafka; Parquet/partitioning, incremental/batch orchestration |
Analytics & BI | KPI governance, semantic modeling, usage telemetry, MTA/funnel/churn analytics, dashboard performance |
Architecture | Event/API ingestion, schema evolution, modular parsers, secure data zones, lineage/metadata awareness |
Governance & Privacy | Retention, classification, masking, PII redaction (Spark + Presidio), responsible analytics & compliance |
Leadership | People development, delivery coaching, cross-functional alignment, enablement/workshops |
ML / Advanced Analytics | Deep learning (PyTorch, TensorFlow/Keras); Transformers & LLMs (BERT family, RoBERTa, sentence-transformers, Llama 3.x, Claude, GPT‑OSS 120B, Gemini); RAG pipelines (vector search/embeddings), PEFT (LoRA/QLoRA), anomaly detection, predictive models |
Tooling & Enablement | dbt, MLflow, GitLab / Github CI/CD, Databricks, Spark ML, .NET (ASP.NET + EF), SQL Server, S3/Ceph, cost/perf optimization |
Technical Stack Snapshot
Languages: Scala, Python, SQL, C#, Go (tooling), R
Platforms: Databricks, Delta Lake, Spark; BigQuery, SQL Server; Azure/AWS/GCP (data services)
Frameworks / Libraries: PyTorch, TensorFlow/Keras, Hugging Face Transformers, Spark NLP, MLflow, dbt, Presidio, Catalyst expressions (custom), .NET data services
Tooling: GitLab / GitHub CI/CD, Apache Zeppelin, Streamlit, Hugo (docs), YAML config frameworks, API / REST ingestion, containerized runtimes, Kafka tooling, S3/Ceph integrations
Data Concerns: Schema governance & lineage/metadata, performance tuning (partitioning, caching), cost & latency optimization, quality gates & observability/SLAs, retention/masking, PII redaction pipelines.
Professional Experience
Optus – Melbourne, Australia (Telecommunications)
Analytics Lead (People Lead) | Jul 2025 – Present
Lead analytics engineering, governance, and people leadership within Customer & Data Analytics, driving CNPS, compliance, and operational efficiency.
Focus Areas: analytics leadership; governance; data engineering; customer analytics
Tech: Spark (Scala/PySpark), Databricks, Delta Lake, Python, SQL, dbt, MLflow, Spark NLP, PyTorch, Streamlit, Presidio (PII redaction), Azure data services, GitLab CI/CD, PaperMod/Hugo (internal docs)
Key Achievements:
- Integrate advanced analytics and ML solutions into customer operations, improving CNPS and reducing complaints and churn.
- Own Customer First initiatives (Responsible Sales, anomaly detection), aligning OPEX/CAPEX budgets and delivery to strategic KPIs.
- Built an unstructured‑text analytics pipeline for agent–customer call/chat transcripts to detect patterns evidencing Responsible Sales at stores: PySpark + Spark NLP (preprocessing/NER), weak‑supervision + PyTorch models (pattern classification), tracked with MLflow; dbt curations for features and a Streamlit review UI for compliance workflows.
- Led LLM‑driven applications on unstructured (agent–customer) transcripts delivering actionable insights and scores for contact centres and retail: complaint risk, churn propensity, summary highlights, root‑cause classification, and agent‑behaviour anomaly alerts.
- Deployed in‑house Spark + Presidio RedactifyAI pipeline replacing external Google DLP API for transcript PII redaction, cutting run‑rate cost ~70–80% (≈AUD 22K annual savings) and reducing end‑to‑end latency ~30%.
- Built statistical models for customer‑journey analytics (survival/longitudinal analysis), sales pattern prediction, promotion uplift, bundles take‑up, and store relocation impact.
- Scaled BI and analytics operations: supported ~700 Tableau views and ~300 workbooks across hundreds of data sources; automated governance, testing, and CI for dashboards; ensured KPI reliability (AOP, quality control, channels, contact centres).
- Drove architecture design of the Analytics Infrastructure; primary liaison with Databricks partner teams; led best practices for Customer Analytics administration, governance, and cost/performance optimization.
- Govern data retention, classification, and masking aligned with Data Office guidance.
- Instituted Jira Service Management workflows for automated issue creation, escalations, and runbooks; wired Git‑based automation for analytics pipelines and report deployments.
- Optimized cost/performance via transformer/embedding stacks (e.g., GTE Large EN v1.5) for retrieval, similarity, and scoring pipelines; deployed hybrid inference to balance latency and accuracy.
- Worked with frontier & open models (GPT‑OSS 120B, Llama 3.x 70B, Gemini 2.5 Pro, Claude) applying supervised fine‑tuning, RL alignment, and PEFT (LoRA/QLoRA) on domain transcripts; implemented RAG pipelines (vector search embeddings) with guardrails (PII filtering, policy prompts) and evaluation harnesses (latency, factuality, safety).
- Lead and mentor analysts and cross-functional delivery teams; build an engaged high‑performing culture (learning sessions, recognition, development plans).
- Designed and shipped end‑to‑end ML/GenAI solutions on Databricks (data prep → feature curation → training → MLflow registry → deployment) with monitoring and governance.
- Led discovery and value mapping with business/technical stakeholders; translated pain points into Databricks‑aligned architectures and scoped short‑to‑medium engagements.
- Built and delivered hands‑on demos/POCs in Python/Scala with Spark/Delta/MLflow; authored reference architectures and how‑tos; ran workshops enabling adoption.
- Implemented MLOps frameworks (CI/CD, model versioning, lineage, drift checks) and productionized NLP/LLM workloads (Spark NLP + PyTorch; SFT/RL alignment on domain transcripts).
Senior Data Engineer | Oct 2024 – Jun 2025
Scope overlapped later leadership role; focused on engineering + governance foundations.
Tech: Spark (Scala/PySpark), Databricks, Delta Lake, Python, SQL, dbt, MLflow, Spark NLP, Azure and Google data services, GitLab CI/CD, Presidio
Highlights: Same initiative set as above while establishing scalable patterns for retention, quality gating, and ML integration.
WTG – Melbourne & Sydney, Australia (Software Solutions / Logistics / Supply Chain)
Career progression: Data Analyst → Data Scientist → Senior Data Scientist (de‑facto team lead / lead developer).
Senior Data Scientist | Nov 2023 – Oct 2024
Led data science & analytics function delivering real-time supply chain insights and scaling capability.
Tech: Python, SQL, Spark, Delta Lake, Ceph S3, Kafka, Databricks; PyTorch, TensorFlow/Keras; Hugging Face Transformers (BERT/DistilBERT/RoBERTa, sentence-transformers); Spark NLP; dashboarding/BI; Apache Zeppelin; GitHub CI; .NET (ASP.NET + Entity Framework), SQL Server; Ceph S3
Highlights:
- Created and led the Carrier Performance Reports Etl development as part of Market Intelligence and Analytics from scratch (Spark/Scala, Kafka, modular sub‑systems); mentored 15+ rotators and owned roadmap over ~4 years, acting as de‑facto team lead and lead software developer.
- Led 3+ cross-functional team delivering predictive & anomaly detection capabilities improving processing errors >25%.
- Built customer analytics platform (ETA, carrier performance, anomaly detection) improving transit prediction accuracy 15%.
- Advanced NLP / LLM prototypes for service efficiency via embeddings & iterative refinement.
- Established distributed pipelines & feature engineering with Spark + Delta Lake.
- Built and evaluated deep learning models for NLP and sequence tasks using PyTorch and TensorFlow/Keras: transformer encoders (BERT/DistilBERT/RoBERTa, sentence-transformers), CNN/LSTM/GRU architectures for sequence classification and time-series signals; leveraged HF hub and Spark NLP for scalable preprocessing/inference.
- Supported cross-team .NET solutions: a data pipeline orchestrator and the analytics website backend (ASP.NET + Entity Framework + SQL Server); integrated data landing/exports with AWS S3; contributed performance fixes and data contracts.
- Used Apache Zeppelin alongside notebooks for rapid exploration, operational runbooks, and production support playbooks.
- Delivered customer‑facing workshops and enablement; authored reference architectures and runbooks to scale onboarding and support.
- Built streaming ingestion with Kafka and Delta; implemented SLAs, observability, and cost/performance tuning (partitioning, caching, autoscale strategies).
- Ran pre‑sales style POCs and demos for prospects, showcasing Delta Lakehouse patterns and measurable time‑to‑value.
- Established KPI governance and data quality checks feeding BI products; improved dashboard performance and reliability at scale.
Data Scientist | Mar 2020 – Nov 2023
Extended platform breadth; hardened anomaly detection and predictive layers. (Highlights consistent with senior role; earlier phase delivery.)
Tech: Python, SQL, Spark, Delta Lake, Databricks; PyTorch, TensorFlow/Keras; Hugging Face Transformers (BERT/DistilBERT/RoBERTa, sentence-transformers); Spark NLP; Apache Zeppelin; SQL Server; Ceph S3; time-series & anomaly detection methods
Data Analyst | Oct 2018 – Apr 2019
Developed early ML prototypes and analytics foundations proving value of advanced analytics.
Tech: Python, SQL, notebooks (Jupyter), Deep Learning & visualization tooling
Highlights:
- Produced sentiment, classification & time estimation prototypes; seeded data science practice.
- Designed scalable architectures enabling team growth from 2 → 30+.
Dentsu – Melbourne, Australia (Digital Marketing)
Data Engineer | May 2019 – Mar 2020
Built multi-cloud marketing analytics pipelines and predictive segmentation models.
Tech: BigQuery (GCP), Databricks (Spark), Python, R (R Shiny), SQL; multi-cloud data services (AWS/Azure/GCP)
Highlights:
- Delivered segmentation & churn models (BigQuery, R Shiny, Python) improving targeting efficiency.
- Implemented multi-touch attribution (MTA) using Markov chain attribution and MCMC path simulation for customer journeys (impressions → clicks → conversions); produced channel removal effects and ROI insights.
- Built funnel analytics (reach → engagement → conversion) and journey diagnostics; complemented with churn analysis and customer segmentation models.
- Built data processing pipelines across AWS / Azure / GCP; leveraged Databricks (Spark) and BigQuery for scalable data preparation and modeling.
Dutch Oil & Gas Company – The Netherlands (Energy)
Head of Analytics Team (People Lead) | Sep 2015 – Jan 2018
Directed analytics strategy & probabilistic modeling for high-value petroleum projects.
Tech: .NET/C#, SQL Server, Python, R, Power BI, SAP, Stata
Highlights:
- Implemented integration pipelines & probabilistic valuation models rescuing underperforming assets.
- Established automated project evaluation, risk assessment & real-time reporting frameworks.
- Architected analytics solutions (.NET, SQL Server, Python, Power BI, R, SAP, Stata).
Earlier Roles & Appointments
Risk Modeller & Mathematician — Probabilistic Risk Modeling Platform | Sep 2013 – Sep 2015
Architected and led the design of a probabilistic risk modeling platform for operational/project risk and portfolio decisions.
Tech: .NET/C#, SQL Server, Python, R; Monte Carlo engines, Bayesian inference workflows; reporting/BI;
probabilistic modeling and risk analytics toolkits; backtesting & time-series methods
Highlights:
- Defined modular model architecture (scenario library, risk taxonomy) supporting Monte Carlo, Bayesian updates, event/fault-tree analysis.
- Built parameterization and data contracts for expert elicitations, historical losses and external datasets with versioning and lineage.
- Implemented calibration/validation and explainability: backtesting, sensitivity/tornado, contribution analysis, P50/P90 and tail (VaR/ES) metrics.
- Automated simulation workflows and reproducible runs with audit trails, approvals and model lifecycle governance.
- Delivered decision support dashboards and reporting for scenario comparisons and risk-adjusted portfolio optimization.
PhD / Senior Lecturer / Associate Professor (Full / Part-time) | Sep 2007 – Jan 2018
Taught advanced probability, stochastic analysis & multivariate statistics; supervised theses & published research.
Tech: R, Python, Matlab, Octave, SQL, statistical computing environments; academic writing with LaTeX & reproducible research workflows
Highlights:
- Supervised Masters & PhD candidates (ML, big data, statistical optimization).
- PhD research on non‑regular Laplace distribution: derived an asymptotic test based on the sign statistic, computed test deficiency, and provided power‑function approximations to make the method practical for applied scientists.
- Investigated Laplace and Student distributions as robust alternatives to normal laws in asymptotic statistical problems (testing and inference under heavy‑tailed regimes).
- Applied Student distribution in insurance analytics; analyzed Bonus–Malus risk‑rating frameworks and modeled claim frequency with Poisson–Gamma (Negative Binomial) heterogeneity, linking to the structural function of the collective.
- Published monograph and peer‑reviewed works on non‑regular tests and probabilistic/actuarial models (see Publications).
FOREX Reuters Dealer (Treasury) | Sep 2001 – Aug 2007
Managed open currency positions and executed hedge strategies within the treasury of a multi‑billion gas company.
Tech: Reuters Dealing (FX), FX spot/forwards/swaps, vanilla options, money market instruments, short‑term lending/borrowing; Excel/VBA, Python/R for quant analysis; technical analysis (MA crossovers, RSI, MACD, support/resistance); binomial lattice models (CRR/JR), stochastic processes (GBM/mean‑reverting), SDE numerics (Euler–Maruyama), Monte Carlo simulation (pricing & strategy evaluation), VaR/ES risk metrics, mean‑variance/CVaR portfolio optimization.
Highlights:
- Built binomial lattice models to value and hedge FX options and to scenario test market indicators (forward points, term‑structure).
- Modeled FX dynamics with stochastic processes (e.g., GBM and mean‑reverting variants); implemented Euler–Maruyama discretization for path generation.
- Applied Monte Carlo to evaluate trading/hedging strategies (PnL distributions, VaR/ES) and stress scenarios; informed hedge ratios and tenor selection.
- Used technical analysis signals (trend/momentum/overbought-oversold) for timing and execution overlay alongside fundamental and quant views.
- Executed FX trades and money market operations; optimized daily liquidity and funding costs under policy constraints.
- Performed optimal currency portfolio allocation (mean‑variance/CVaR) and managed limits aligned to risk appetite.
- Monitored and controlled open currency positions; ensured policy and regulatory compliance.
Education & Continuous Learning
Education
- Lomonosov Moscow State University (MSU)
- PhD, Probability Theory and Mathematical Statistics (2007–2010)
- Bachelor’s, Applied Mathematics and Computer Science–with distinction (2004–2007)
- Finance Academy (FA)
- Bachelor’s, Banking and Finance–with distinction (1998-2003)
Certifications
- Databricks Certified Machine Learning Professional — ID 158990695 (Issued Aug 2025, Expires Aug 2027)
- Academy Accreditation – Azure Databricks Platform Architect — ID 154319049 (Issued Jul 2025, Expires Jul 2027)
- Skillsoft: Data Engineering on Microsoft Azure – Databricks Processing — ID 14141045 (Issued Apr 2025)
- Databricks Certified Associate Developer for Apache Spark 3.0 — ID 125542547 (Issued Dec 2024)
Selected Courses & Credentials
- Coursera: Parallel Programming — ID YV5HYZJ2H3RH (Mar 2020)
- Coursera: Functional Program Design in Scala — ID 2ZFM283M2H6E (Feb 2020)
- Coursera: Functional Programming Principles in Scala — ID DZR725FMEF5K (Oct 2019)
- Coursera: Building Resilient Streaming Systems on GCP — ID P9F377QHVKU8 (Sep 2019)
- Coursera: Data Engineering, Big Data, and Machine Learning on GCP (Specialization) — ID FZRJ9ATU2V43 (Sep 2019)
- DataCamp: Introduction to Spark in R using sparklyr — ID 10407998 (Aug 2019)
- Coursera: Serverless Machine Learning with TensorFlow on GCP — ID QGVNVE976L4X (Aug 2019)
- Coursera: Serverless Data Analysis with BigQuery & Dataflow — ID BXCPX63879LD (May 2019)
- Coursera: GCP Big Data and Machine Learning Fundamentals — ID JBQZ562J7XTU (Apr 2019)
- Coursera: Leveraging Unstructured Data with Cloud Dataproc — ID QYS4S6YY5V9S (Apr 2019)
- Coursera: Deep Learning Specialization — ID DL9KPH7DCNMU (Feb 2019)
- Coursera: Convolutional Neural Networks — ID NHTEEQ8YLLJ5 (Feb 2019)
- Coursera: Improving Deep Neural Networks — ID ZYKM2B3JCJSV (Feb 2019)
- Coursera: Structuring Machine Learning Projects — ID JNYSVZ2DG7D3 (Feb 2019)
- Coursera: Sequence Models — ID 7YPU9V9996HV (Jan 2019)
- International Monetary Fund: MFx – Macroeconometric Forecasting — ID 660e7e0027494f5685885aea974399a8 (Nov 2018)
- Coursera: Neural Networks and Deep Learning — ID XUAPV4985LMP (Nov 2018)
- Microsoft: DAT207x – Analyzing and Visualizing Data with Power BI — ID 31ab907165c942ce9c038d50d117a04f (Aug 2018)
- Coursera: Interest Rate Models — ID 9JSS584JS3QW (Jul 2018)
- KyotoUx: 009x – Stochastic Processes: Data Analysis and Computer Simulation — ID ec0aba27accc4dcf8735f8bb3f16ed2d (Jun 2018)
- Coursera: Financial Engineering and Risk Management Part I — ID QBPSSUWRR23B (Jun 2018)
- Coursera: Financial Engineering and Risk Management Part II — ID TYV3GRL5RUAY (Jun 2018)
- Coursera: Probabilistic Graphical Models 1 – Representation (with Honors) — ID ZFB3HPFCJ2P6 (Jun 2018)
- Coursera: Probabilistic Graphical Models 2 – Inference — ID Z5PL7RSY2MU8 (Jun 2018)
- Coursera: Probabilistic Graphical Models 3 – Learning — ID B6X46BQ2E9VR (Jun 2018)
- Coursera: Probabilistic Graphical Models (Specialization) — ID CZ3MPZX8LRL9 (Jun 2018)
- edX: Credit Risk Management — ID ae771af8404a4d88904cdf18c1e8bdb8 (Jun 2014)
- MITx: Introduction to Probability – The Science of Uncertainty — ID 82bd8cb5fe6747199d29eaa78268225e (May 2014)
- Coursera: Principles of Economics for Scientists (Sep 2013)
- Coursera: Machine Learning (Aug 2013)
- MITx: Introduction to Computer Science and Programming — ID d8046e9001184eba980304689ae824f4 (Jun 2013)
- Coursera: Financial Engineering and Risk Management (May 2013)
- Coursera: Linear and Discrete Optimization (May 2013)
- Coursera: Calculus – Single Variable (Apr 2013)
- Coursera: Computer Networks (Apr 2013)
- Coursera: Computing for Data Analysis (Apr 2013)
- Coursera: Algorithms – Design and Analysis, Part 1 (Mar 2013; also Dec 2012)
- Coursera: Probabilistic Graphical Models (Dec 2012)
- Coursera: Natural Language Processing (May 2012)
Ongoing: ML experimentation, privacy‑preserving data engineering techniques, LLM augmentation patterns.
Leadership & Governance Themes
- People leadership and mentoring (analysts/rotators), code reviews, and roadmap ownership.
- Data masking & retention frameworks embedded early in lifecycle.
- Cost & latency optimization (build vs buy tradeoffs demonstrated by RedactifyAI pipeline savings).
- Culture of documentation & observability (proactive defect detection via anomaly signals & KPI drift checks).
Publications / Speaking
- Korolev, R. A.; Bening, V. E. (2010). “On the power of an asymptotically optimal test for the case of Laplace distribution.” Banach Center Publications, 90(1), 27–38. Article • PDF • DOI
- Korolev, R. (2011). “On the power of the sign test for the case of Laplace distribution.” LAP Lambert Academic Publishing. ISBN 978-3845473857. Amazon
Contact
- GitLab: https://gitlab.com/rokorolev
- LinkedIn: https://www.linkedin.com/in/roman-k-data-lead
- Projects: /projects/
- About: /about/