A production-style growth engine that turns raw GA4 events into actionable customer segmentation. Runs a full lifecycle: ingestion → SQL feature engineering → local Postgres storage → K-Means clustering → cluster definitions & recommended actions → API endpoints consumed by a streaming web UI. Deployed on a single server using Docker Compose, running the API and a PostgreSQL container image together in production.
Trained and evaluated a gradient boosting model to forecast customer reordering behaviour on event-level sales data. Engineered features and analysed impact; achieved 83% ROC-AUC and 66% relative improvement over the baseline.
Analysed and automated Large Lexical Corpora dataset preparing tools for Evaluation of LLM translations. Given a pair of Adj+Noun in any source and target languages, the algorithm will assess the quality of translation based on natural fluency, perplexity according to other language models, and lexical corpus matching. Word sense disambiguation with BERT-like models applied.
Engineered a data pipeline that transformed raw event-level sales records into ingredient-level demand forecasts, using optical character recognition (OCR) and regex parsing to standardise unadapted sales sources. Implemented a custom linear forecasting engine to model 10+ time series, enabling product merit analysis and improving production planning and operational efficiency.
I define the decision the output should enable and who will use it. I set a clear “good enough” threshold early to keep the work focused on impact. If the answer won’t change anything, I simplify the question instead of overbuilding.
I treat analytical systems as products with lifetime value, not one-off reports. Presentation is part of the sale: if it’s not understood quickly, it won’t be adopted. I design for reuse, iteration, and a workflow that can run repeatedly.
I aim for clear outputs, not just correct numbers — interpretation matters more than a dense pivot. I turn raw data into signals: what’s growing, what’s declining, and what deserves attention. If a result can’t be explained in plain language, it won’t drive decisions.
I check whether the data is truthful, whether transformations are necessary, and whether we’re answering the real question. I sanity-check assumptions and stop when the output is useful, not when it is maximally complex. This typically leads to simpler pipelines and more confident conclusions.
The core decision wasn’t just forecast accuracy — it was reducing operational effort waste and justifying menu complexity. That framing let me simplify modelling and optimise for usability and repeatability, not fancy metrics.
The workflow had to cover A–Z in user experience — otherwise it would be seen as too complex to implement. I scoped an MVP that is easy to run and produces planning outputs that directly support cost reduction.
The system was intended for non-technical use in daily operations. Outputs were designed to be plug-and-play: interpretable signals and planning quantities, without relying on opaque scoring.
While forecasting, I questioned whether I used all available information and discovered an additional asset: a product merit report. This shifted the work from “predict next week” to “improve what we choose to sell” — enabling strategic simplification decisions.
Open to data science roles, collaboration, and thoughtful conversations