Curriculum Vitae

Strategic data/ML leader and hands-on engineer with end-to-end impact across telecommunications, logistics & supply chain, energy, marketing, and finance. I build scalable data platforms (Spark/Delta), production ML (MLflow/feature curation), and reliable KPI ecosystems—while leading analysts/engineers through clear contracts, governance, and automation.

Career Snapshot

2025–Present: Analytics Lead (Optus) — CNPS improvement, Responsible Sales analytics, ML/LLM in production; data governance, PII redaction (Spark + Presidio), and BI scale (~700 views, ~300 workbooks).
2024–2025: Senior Data Engineer (Optus) — Databricks patterns (Spark/Delta, dbt, MLflow), retention/masking, quality gates, and CI/CD foundations.
2018–2024: WTG — Data Analyst → Senior Data Scientist (de‑facto lead); built ETL for Market Intelligence and Analytics; streaming (Kafka + Delta), deep‑learning NLP (PyTorch/Keras, Transformers), .NET backend/orchestrator (ASP.NET + EF + SQL Server), S3/Ceph.
2019–2020: Data Engineer (Dentsu) — MTA (Markov/MCMC), funnel & churn analytics; Databricks (Spark) + BigQuery; segmentation; presented technical architecture to Energy client.
2015–2018: Head of Analytics (Oil & Gas) — probabilistic modeling, portfolio risk, analytics organization leadership.
2001–2015: Finance & Risk — FX dealing, quant modeling, SDE numerics, Monte Carlo; academic (full / part‑time) in probability/statistics.

Core Competencies

Domain	Skills
Data Engineering	Spark (Scala/PySpark), Databricks, Delta Lake; streaming with Kafka; Parquet/partitioning, incremental/batch orchestration
Analytics & BI	KPI governance, semantic modeling, usage telemetry, MTA/funnel/churn analytics, dashboard performance
Architecture	Event/API ingestion, schema evolution, modular parsers, secure data zones, lineage/metadata awareness
Governance & Privacy	Retention, classification, masking, PII redaction (Spark + Presidio), responsible analytics & compliance
Leadership	People development, delivery coaching, cross-functional alignment, enablement/workshops
ML / Advanced Analytics	Deep learning (PyTorch, TensorFlow/Keras); Transformers & LLMs (BERT family, RoBERTa, sentence-transformers, Llama 3.x, Claude, GPT‑OSS 120B, Gemini); RAG pipelines (vector search/embeddings), PEFT (LoRA/QLoRA), anomaly detection, predictive models
Tooling & Enablement	dbt, MLflow, GitLab / Github CI/CD, Databricks, Spark ML, .NET (ASP.NET + EF), SQL Server, S3/Ceph, cost/perf optimization

Technical Stack Snapshot

Languages: Scala, Python, SQL, C#, Go (tooling), R
Platforms: Databricks, Delta Lake, Spark; BigQuery, SQL Server; Azure/AWS/GCP (data services)
Frameworks / Libraries: PyTorch, TensorFlow/Keras, Hugging Face Transformers, Spark NLP, MLflow, dbt, Presidio, Catalyst expressions (custom), .NET data services
Tooling: GitLab / GitHub CI/CD, Apache Zeppelin, Streamlit, Hugo (docs), YAML config frameworks, API / REST ingestion, containerized runtimes, Kafka tooling, S3/Ceph integrations
Data Concerns: Schema governance & lineage/metadata, performance tuning (partitioning, caching), cost & latency optimization, quality gates & observability/SLAs, retention/masking, PII redaction pipelines.

Professional Experience

Optus – Melbourne, Australia (Telecommunications)

Analytics Lead (People Lead) | Jul 2025 – Present
Lead analytics engineering, governance, and people leadership within Customer & Data Analytics, driving CNPS, compliance, and operational efficiency.
Focus Areas: analytics leadership; governance; data engineering; customer analytics
Tech: Spark (Scala/PySpark), Databricks, Delta Lake, Python, SQL, dbt, MLflow, Spark NLP, PyTorch, Streamlit, Presidio (PII redaction), Azure data services, GitLab CI/CD, PaperMod/Hugo (internal docs)
Key Achievements:

Integrate advanced analytics and ML solutions into customer operations, improving CNPS and reducing complaints and churn.
Own Customer First initiatives (Responsible Sales, anomaly detection), aligning OPEX/CAPEX budgets and delivery to strategic KPIs.
Built an unstructured‑text analytics pipeline for agent–customer call/chat transcripts to detect patterns evidencing Responsible Sales at stores: PySpark + Spark NLP (preprocessing/NER), weak‑supervision + PyTorch models (pattern classification), tracked with MLflow; dbt curations for features and a Streamlit review UI for compliance workflows.
Led LLM‑driven applications on unstructured (agent–customer) transcripts delivering actionable insights and scores for contact centres and retail: complaint risk, churn propensity, summary highlights, root‑cause classification, and agent‑behaviour anomaly alerts.
Deployed in‑house Spark + Presidio RedactifyAI pipeline replacing external Google DLP API for transcript PII redaction, cutting run‑rate cost ~70–80% (≈AUD 22K annual savings) and reducing end‑to‑end latency ~30%.
Built statistical models for customer‑journey analytics (survival/longitudinal analysis), sales pattern prediction, promotion uplift, bundles take‑up, and store relocation impact.
Scaled BI and analytics operations: supported ~700 Tableau views and ~300 workbooks across hundreds of data sources; automated governance, testing, and CI for dashboards; ensured KPI reliability (AOP, quality control, channels, contact centres).
Drove architecture design of the Analytics Infrastructure; primary liaison with Databricks partner teams; led best practices for Customer Analytics administration, governance, and cost/performance optimization.
Govern data retention, classification, and masking aligned with Data Office guidance.
Instituted Jira Service Management workflows for automated issue creation, escalations, and runbooks; wired Git‑based automation for analytics pipelines and report deployments.
Optimized cost/performance via transformer/embedding stacks (e.g., GTE Large EN v1.5) for retrieval, similarity, and scoring pipelines; deployed hybrid inference to balance latency and accuracy.
Worked with frontier & open models (GPT‑OSS 120B, Llama 3.x 70B, Gemini 2.5 Pro, Claude) applying supervised fine‑tuning, RL alignment, and PEFT (LoRA/QLoRA) on domain transcripts; implemented RAG pipelines (vector search embeddings) with guardrails (PII filtering, policy prompts) and evaluation harnesses (latency, factuality, safety).
Lead and mentor analysts and cross-functional delivery teams; build an engaged high‑performing culture (learning sessions, recognition, development plans).
Designed and shipped end‑to‑end ML/GenAI solutions on Databricks (data prep → feature curation → training → MLflow registry → deployment) with monitoring and governance.
Led discovery and value mapping with business/technical stakeholders; translated pain points into Databricks‑aligned architectures and scoped short‑to‑medium engagements.
Built and delivered hands‑on demos/POCs in Python/Scala with Spark/Delta/MLflow; authored reference architectures and how‑tos; ran workshops enabling adoption.
Implemented MLOps frameworks (CI/CD, model versioning, lineage, drift checks) and productionized NLP/LLM workloads (Spark NLP + PyTorch; SFT/RL alignment on domain transcripts).

Senior Data Engineer | Oct 2024 – Jun 2025
Scope overlapped later leadership role; focused on engineering + governance foundations.
Tech: Spark (Scala/PySpark), Databricks, Delta Lake, Python, SQL, dbt, MLflow, Spark NLP, Azure and Google data services, GitLab CI/CD, Presidio
Highlights: Same initiative set as above while establishing scalable patterns for retention, quality gating, and ML integration.

WTG – Melbourne & Sydney, Australia (Software Solutions / Logistics / Supply Chain)

Career progression: Data Analyst → Data Scientist → Senior Data Scientist (de‑facto team lead / lead developer).
Senior Data Scientist | Nov 2023 – Oct 2024
Led data science & analytics function delivering real-time supply chain insights and scaling capability.
Tech: Python, SQL, Spark, Delta Lake, Ceph S3, Kafka, Databricks; PyTorch, TensorFlow/Keras; Hugging Face Transformers (BERT/DistilBERT/RoBERTa, sentence-transformers); Spark NLP; dashboarding/BI; Apache Zeppelin; GitHub CI; .NET (ASP.NET + Entity Framework), SQL Server; Ceph S3
Highlights:

Created and led the Carrier Performance Reports Etl development as part of Market Intelligence and Analytics from scratch (Spark/Scala, Kafka, modular sub‑systems); mentored 15+ rotators and owned roadmap over ~4 years, acting as de‑facto team lead and lead software developer.
Led 3+ cross-functional team delivering predictive & anomaly detection capabilities improving processing errors >25%.
Built customer analytics platform (ETA, carrier performance, anomaly detection) improving transit prediction accuracy 15%.
Advanced NLP / LLM prototypes for service efficiency via embeddings & iterative refinement.
Established distributed pipelines & feature engineering with Spark + Delta Lake.
Built and evaluated deep learning models for NLP and sequence tasks using PyTorch and TensorFlow/Keras: transformer encoders (BERT/DistilBERT/RoBERTa, sentence-transformers), CNN/LSTM/GRU architectures for sequence classification and time-series signals; leveraged HF hub and Spark NLP for scalable preprocessing/inference.
Supported cross-team .NET solutions: a data pipeline orchestrator and the analytics website backend (ASP.NET + Entity Framework + SQL Server); integrated data landing/exports with AWS S3; contributed performance fixes and data contracts.
Used Apache Zeppelin alongside notebooks for rapid exploration, operational runbooks, and production support playbooks.
Delivered customer‑facing workshops and enablement; authored reference architectures and runbooks to scale onboarding and support.
Built streaming ingestion with Kafka and Delta; implemented SLAs, observability, and cost/performance tuning (partitioning, caching, autoscale strategies).
Ran pre‑sales style POCs and demos for prospects, showcasing Delta Lakehouse patterns and measurable time‑to‑value.
Established KPI governance and data quality checks feeding BI products; improved dashboard performance and reliability at scale.

Data Scientist | Mar 2020 – Nov 2023
Extended platform breadth; hardened anomaly detection and predictive layers. (Highlights consistent with senior role; earlier phase delivery.)
Tech: Python, SQL, Spark, Delta Lake, Databricks; PyTorch, TensorFlow/Keras; Hugging Face Transformers (BERT/DistilBERT/RoBERTa, sentence-transformers); Spark NLP; Apache Zeppelin; SQL Server; Ceph S3; time-series & anomaly detection methods

Data Analyst | Oct 2018 – Apr 2019
Developed early ML prototypes and analytics foundations proving value of advanced analytics.
Tech: Python, SQL, notebooks (Jupyter), Deep Learning & visualization tooling
Highlights:

Produced sentiment, classification & time estimation prototypes; seeded data science practice.
Designed scalable architectures enabling team growth from 2 → 30+.

Dentsu – Melbourne, Australia (Digital Marketing)

Data Engineer | May 2019 – Mar 2020
Built multi-cloud marketing analytics pipelines and predictive segmentation models.
Tech: BigQuery (GCP), Databricks (Spark), Python, R (R Shiny), SQL; multi-cloud data services (AWS/Azure/GCP)
Highlights:

Delivered segmentation & churn models (BigQuery, R Shiny, Python) improving targeting efficiency.
Implemented multi-touch attribution (MTA) using Markov chain attribution and MCMC path simulation for customer journeys (impressions → clicks → conversions); produced channel removal effects and ROI insights.
Built funnel analytics (reach → engagement → conversion) and journey diagnostics; complemented with churn analysis and customer segmentation models.
Built data processing pipelines across AWS / Azure / GCP; leveraged Databricks (Spark) and BigQuery for scalable data preparation and modeling.

Dutch Oil & Gas Company – The Netherlands (Energy)

Head of Analytics Team (People Lead) | Sep 2015 – Jan 2018
Directed analytics strategy & probabilistic modeling for high-value petroleum projects.
Tech: .NET/C#, SQL Server, Python, R, Power BI, SAP, Stata
Highlights:

Implemented integration pipelines & probabilistic valuation models rescuing underperforming assets.
Established automated project evaluation, risk assessment & real-time reporting frameworks.
Architected analytics solutions (.NET, SQL Server, Python, Power BI, R, SAP, Stata).

Earlier Roles & Appointments

Risk Modeller & Mathematician — Probabilistic Risk Modeling Platform | Sep 2013 – Sep 2015
Architected and led the design of a probabilistic risk modeling platform for operational/project risk and portfolio decisions.
Tech: .NET/C#, SQL Server, Python, R; Monte Carlo engines, Bayesian inference workflows; reporting/BI; probabilistic modeling and risk analytics toolkits; backtesting & time-series methods
Highlights:

Defined modular model architecture (scenario library, risk taxonomy) supporting Monte Carlo, Bayesian updates, event/fault-tree analysis.
Built parameterization and data contracts for expert elicitations, historical losses and external datasets with versioning and lineage.
Implemented calibration/validation and explainability: backtesting, sensitivity/tornado, contribution analysis, P50/P90 and tail (VaR/ES) metrics.
Automated simulation workflows and reproducible runs with audit trails, approvals and model lifecycle governance.
Delivered decision support dashboards and reporting for scenario comparisons and risk-adjusted portfolio optimization.

PhD / Senior Lecturer / Associate Professor (Full / Part-time) | Sep 2007 – Jan 2018
Taught advanced probability, stochastic analysis & multivariate statistics; supervised theses & published research.
Tech: R, Python, Matlab, Octave, SQL, statistical computing environments; academic writing with LaTeX & reproducible research workflows
Highlights:

Supervised Masters & PhD candidates (ML, big data, statistical optimization).
PhD research on non‑regular Laplace distribution: derived an asymptotic test based on the sign statistic, computed test deficiency, and provided power‑function approximations to make the method practical for applied scientists.
Investigated Laplace and Student distributions as robust alternatives to normal laws in asymptotic statistical problems (testing and inference under heavy‑tailed regimes).
Applied Student distribution in insurance analytics; analyzed Bonus–Malus risk‑rating frameworks and modeled claim frequency with Poisson–Gamma (Negative Binomial) heterogeneity, linking to the structural function of the collective.
Published monograph and peer‑reviewed works on non‑regular tests and probabilistic/actuarial models (see Publications).

FOREX Reuters Dealer (Treasury) | Sep 2001 – Aug 2007
Managed open currency positions and executed hedge strategies within the treasury of a multi‑billion gas company.
Tech: Reuters Dealing (FX), FX spot/forwards/swaps, vanilla options, money market instruments, short‑term lending/borrowing; Excel/VBA, Python/R for quant analysis; technical analysis (MA crossovers, RSI, MACD, support/resistance); binomial lattice models (CRR/JR), stochastic processes (GBM/mean‑reverting), SDE numerics (Euler–Maruyama), Monte Carlo simulation (pricing & strategy evaluation), VaR/ES risk metrics, mean‑variance/CVaR portfolio optimization.
Highlights:

Built binomial lattice models to value and hedge FX options and to scenario test market indicators (forward points, term‑structure).
Modeled FX dynamics with stochastic processes (e.g., GBM and mean‑reverting variants); implemented Euler–Maruyama discretization for path generation.
Applied Monte Carlo to evaluate trading/hedging strategies (PnL distributions, VaR/ES) and stress scenarios; informed hedge ratios and tenor selection.
Used technical analysis signals (trend/momentum/overbought-oversold) for timing and execution overlay alongside fundamental and quant views.
Executed FX trades and money market operations; optimized daily liquidity and funding costs under policy constraints.
Performed optimal currency portfolio allocation (mean‑variance/CVaR) and managed limits aligned to risk appetite.
Monitored and controlled open currency positions; ensured policy and regulatory compliance.

Education & Continuous Learning

Education

Lomonosov Moscow State University (MSU)
- PhD, Probability Theory and Mathematical Statistics (2007–2010)
- Bachelor’s, Applied Mathematics and Computer Science–with distinction (2004–2007)
Finance Academy (FA)
- Bachelor’s, Banking and Finance–with distinction (1998-2003)

Certifications

Databricks Certified Machine Learning Professional — ID 158990695 (Issued Aug 2025, Expires Aug 2027)
Academy Accreditation – Azure Databricks Platform Architect — ID 154319049 (Issued Jul 2025, Expires Jul 2027)
Skillsoft: Data Engineering on Microsoft Azure – Databricks Processing — ID 14141045 (Issued Apr 2025)
Databricks Certified Associate Developer for Apache Spark 3.0 — ID 125542547 (Issued Dec 2024)

Selected Courses & Credentials

Coursera: Parallel Programming — ID YV5HYZJ2H3RH (Mar 2020)
Coursera: Functional Program Design in Scala — ID 2ZFM283M2H6E (Feb 2020)
Coursera: Functional Programming Principles in Scala — ID DZR725FMEF5K (Oct 2019)
Coursera: Building Resilient Streaming Systems on GCP — ID P9F377QHVKU8 (Sep 2019)
Coursera: Data Engineering, Big Data, and Machine Learning on GCP (Specialization) — ID FZRJ9ATU2V43 (Sep 2019)
DataCamp: Introduction to Spark in R using sparklyr — ID 10407998 (Aug 2019)
Coursera: Serverless Machine Learning with TensorFlow on GCP — ID QGVNVE976L4X (Aug 2019)
Coursera: Serverless Data Analysis with BigQuery & Dataflow — ID BXCPX63879LD (May 2019)
Coursera: GCP Big Data and Machine Learning Fundamentals — ID JBQZ562J7XTU (Apr 2019)
Coursera: Leveraging Unstructured Data with Cloud Dataproc — ID QYS4S6YY5V9S (Apr 2019)
Coursera: Deep Learning Specialization — ID DL9KPH7DCNMU (Feb 2019)
Coursera: Convolutional Neural Networks — ID NHTEEQ8YLLJ5 (Feb 2019)
Coursera: Improving Deep Neural Networks — ID ZYKM2B3JCJSV (Feb 2019)
Coursera: Structuring Machine Learning Projects — ID JNYSVZ2DG7D3 (Feb 2019)
Coursera: Sequence Models — ID 7YPU9V9996HV (Jan 2019)
International Monetary Fund: MFx – Macroeconometric Forecasting — ID 660e7e0027494f5685885aea974399a8 (Nov 2018)
Coursera: Neural Networks and Deep Learning — ID XUAPV4985LMP (Nov 2018)
Microsoft: DAT207x – Analyzing and Visualizing Data with Power BI — ID 31ab907165c942ce9c038d50d117a04f (Aug 2018)
Coursera: Interest Rate Models — ID 9JSS584JS3QW (Jul 2018)
KyotoUx: 009x – Stochastic Processes: Data Analysis and Computer Simulation — ID ec0aba27accc4dcf8735f8bb3f16ed2d (Jun 2018)
Coursera: Financial Engineering and Risk Management Part I — ID QBPSSUWRR23B (Jun 2018)
Coursera: Financial Engineering and Risk Management Part II — ID TYV3GRL5RUAY (Jun 2018)
Coursera: Probabilistic Graphical Models 1 – Representation (with Honors) — ID ZFB3HPFCJ2P6 (Jun 2018)
Coursera: Probabilistic Graphical Models 2 – Inference — ID Z5PL7RSY2MU8 (Jun 2018)
Coursera: Probabilistic Graphical Models 3 – Learning — ID B6X46BQ2E9VR (Jun 2018)
Coursera: Probabilistic Graphical Models (Specialization) — ID CZ3MPZX8LRL9 (Jun 2018)
edX: Credit Risk Management — ID ae771af8404a4d88904cdf18c1e8bdb8 (Jun 2014)
MITx: Introduction to Probability – The Science of Uncertainty — ID 82bd8cb5fe6747199d29eaa78268225e (May 2014)
Coursera: Principles of Economics for Scientists (Sep 2013)
Coursera: Machine Learning (Aug 2013)
MITx: Introduction to Computer Science and Programming — ID d8046e9001184eba980304689ae824f4 (Jun 2013)
Coursera: Financial Engineering and Risk Management (May 2013)
Coursera: Linear and Discrete Optimization (May 2013)
Coursera: Calculus – Single Variable (Apr 2013)
Coursera: Computer Networks (Apr 2013)
Coursera: Computing for Data Analysis (Apr 2013)
Coursera: Algorithms – Design and Analysis, Part 1 (Mar 2013; also Dec 2012)
Coursera: Probabilistic Graphical Models (Dec 2012)
Coursera: Natural Language Processing (May 2012)

Ongoing: ML experimentation, privacy‑preserving data engineering techniques, LLM augmentation patterns.

Leadership & Governance Themes

People leadership and mentoring (analysts/rotators), code reviews, and roadmap ownership.
Data masking & retention frameworks embedded early in lifecycle.
Cost & latency optimization (build vs buy tradeoffs demonstrated by RedactifyAI pipeline savings).
Culture of documentation & observability (proactive defect detection via anomaly signals & KPI drift checks).

Publications / Speaking

Korolev, R. A.; Bening, V. E. (2010). “On the power of an asymptotically optimal test for the case of Laplace distribution.” Banach Center Publications, 90(1), 27–38. Article • PDF • DOI
Korolev, R. (2011). “On the power of the sign test for the case of Laplace distribution.” LAP Lambert Academic Publishing. ISBN 978-3845473857. Amazon

Contact

GitLab: https://gitlab.com/rokorolev
LinkedIn: https://www.linkedin.com/in/roman-k-data-lead
Projects: /projects/
About: /about/

Curriculum Vitae#

Career Snapshot#

Core Competencies#

Technical Stack Snapshot#

Professional Experience#

Optus – Melbourne, Australia (Telecommunications)#

WTG – Melbourne & Sydney, Australia (Software Solutions / Logistics / Supply Chain)#

Dentsu – Melbourne, Australia (Digital Marketing)#

Dutch Oil & Gas Company – The Netherlands (Energy)#

Earlier Roles & Appointments#

Education & Continuous Learning#

Education#

Certifications#

Selected Courses & Credentials#

Leadership & Governance Themes#

Publications / Speaking#

Contact#