Experience

Summary #

I’ve spent the last 12 years or so working across data science, ML engineering, and data/software engineering. My career has been somewhat unconventional - I started material physics and ended up building production ML systems via computational neuroscience. More recently my focus has shifted to generative AI and agentic systems, designing multi-agent workflows that need to be reliable enough for production, not just impressive in a demo. The thread connecting all of it is an interest in making quantitative methods actually work in practice, not just in notebooks. Day to day I write Python and SQL, build ML models (classical and deep), design data pipelines, and deploy things to cloud services. I’ve also built a fair number of APIs and event-driven services. I care about software engineering in a way that’s apparently unusual for data scientists, which has turned out to be useful.

Key Technologies #

  • Python and SQL
  • ML (e.g. Scikit Learn and Tensorflow) and agentic (e.g. PydanticAI and Autogen) frameworks
  • Apache Spark
  • Relational databases (e.g. PostgreSQL)
  • Non-relational databases (e.g. Redis and Elasticsearch)
  • Data workflow management tools (e.g. Dagster and Apache Airflow)
  • AWS/GCP and Terraform
  • FastAPI, Flask, and Django

Technical Advisor #

(February 2018 - Present)

I advise small companies and early-stage startups on data strategy, system architecture, and applied ML. Industries have included healthcare, patent law, and sport science. I get involved in development when needed rather than just producing slide decks.

Some examples:

  • A custom word2vec model (Gensim) for patent document matching — used for search, infringement detection, and classification.
  • Using GPT-4 to automate initial interview screening.
  • Automatically adjusting triathlon training plans based on athlete data.

SentinelOne #

Staff Machine Learning Engineer (July 2024 - Present)

I work on Purple AI, SentinelOne’s agentic AI assistant for security analysts. It turns plain-language questions into actionable insights from live security data. The interesting problems here are at the intersection of LLM research and production reliability: multi-agent workflow design, making stochastic outputs behave deterministically enough for incident response, and building evaluation pipelines that measure both factual accuracy and security relevance. The goal is an assistant analysts actually trust during incidents, not one that only works in demos.

DeepL #

Staff Data Scientist (February 2023 - July 2024)

Built Python and SQL data pipelines processing commercial and platform usage data into structured time-series, running on ClickHouse. The main project I led was a GPT-4-powered Slack bot that interpreted analytics queries from business teams, generated SQL against our warehouse schema, and returned results in natural language — designed to cut the volume of ad-hoc data requests. I also overhauled the data science interview process and pushed adoption of data engineering best practices (medallion architecture, dbt, Airflow) across the team.

Infogrid #

Senior Data Scientist (February 2021 - February 2023)

Built and deployed LSTM models on temperature sensor time-series data to predict state changes (desk occupancy and water flow through pipes) for building management systems. These models were Infogrid’s primary revenue driver, enabling automated real-time decisions. Beyond modelling, I led the development of several core Python services using FastAPI, Redis, TimescaleDB, SQS, SNS and DynamoDB. These were event-driven, scalable, and built with MLOps practices that the organisation hadn’t previously adopted.

Opensignal #

Senior Data Scientist (January 2018 - January 2021)

Developed statistically rigorous metrics for measuring mobile network performance from terabytes of crowd-sourced sensor and location data. Built the scalable pipelines to compute them using Python, PySpark, and Airflow on AWS.

CognitionX #

“Data Scientist” (December 2016 - January 2018)

An unusual role in which I led the development team rather than sitting within one. Responsibilities covered data science, software engineering, product management, and cloud architecture. Built a web portal for discovering AI-related resources, companies, and people, plus a nominations and voting system for the CogX 2017 conference. Stack: AWS, Django, Elasticsearch, Flask, Neo4j, PostgreSQL.

Big Data Partnership #

Data Scientist (October 2014 - December 2016)

My first industry role. Worked across many industries, including ad-tech, healthcare, and aviation, using the full range of classical ML (logistic regression, SVMs, random forests, k-means, DBSCAN, PCA) plus survival analysis, Markov models, and deep learning for computer vision. Also provided pre-sales SME support and developed a training course, “Introduction to Data Science in a Big Data World”, that generated over £250k in its first six months. This is where I learned Python and SQL properly.

Atomic Weapons Establishment #

Research Scientist (September 2008 - September 2010)

Finite element modelling and numerical analysis of high-pressure shock wave propagation in piezoelectric materials from high-velocity impacts, aimed at optimising electrical output. Also completed AWE’s graduate programme covering project management and communication skills, which I’m told are important.

Education #

PhD: Experimentally Verified Reduced Models of Neocortical Pyramidal Cells #

University of Warwick (October 2011 - September 2014)

My PhD addressed two problems in computational neuroscience using simplified neuron models. Firstly, brain network models typically include different neuron classes but ignore within-class variability. I characterised the responses and electrical properties of a specific neuron type, then developed an algorithm to generate populations of simplified models that preserved the observed variability. Secondly, after a neuron fires, its spike threshold jumps then slowly recovers. I showed a simple model captured this accurately, found conditions under which a two-variable model could be reduced further without losing accuracy, and demonstrated that a previously proposed mechanism couldn’t fully explain the observed dynamics.

MSc: Mathematical Biology and Biophysical Chemistry #

University of Warwick (Distinction, October 2010 - September 2011)

MMath: Mathematics #

University of Oxford (Upper second class, October 2004 - July 2008)