Data Scientist. Mathematics Master's (MMath) graduate from the University of Oxford.

Experience

(June 2022 - Present) Data Scientist - BIOStress

First full-time data science hire at an early stage startup, responsible for all the models and data pipeline.

  • Pioneered the development of a stress detection algorithm utilising time-series signals from wearable devices leading to improved accuracy and increased robustness
  • Utilised statistical machine learning classification models and cross-validated against scientific literature to design a state-of-the-art model that surpassed all others in both scientific and commercial domains
  • Mentored and supervised a master's student over a 3-month internship, providing guidance and setting projects to help them develop their data science skills

(July 2021 - September 2021) Research Intern - Oxehealth

10 week internship at a vision-based medical device company.

  • Evaluated viability of a new product by analysing medical literature, writing a module to wrangle physiological data, and training machine learning models to give insight into a key focus area for the company
  • Developed tools with PySpark to audit large amounts of data leading to a scalable way to evaluate algorithm performance across the estate
  • Experience working with sensitive medical data from care homes and NHS mental health trusts

Education

(2018 - 2022) University of Oxford, Integrated Master's in Mathematics (MMath).

Master's thesis on the statistical behaviour of protein folding using data from computational models.

Areas of focus:

  • Statistics and data science
  • Computational biology
  • Machine learning and deep learning
  • Numerical methods
  • Information theory

(2016 - 2018) King's College London Mathematics School.

A-Levels: Mathematics (A*), Further Mathematics (A*), Physics (A)

AS-Levels: Computer Science - Python (A), Further Additional Mathematics (A)

Expertise

Experienced in working on large code bases, version control and CI/CD (git), Linux (Debian-based/Ubuntu, desktop and server), and SQL. Particular aptitude for quantitative analysis of physiological and time-series data and building robust and explainable machine learning models.

Python: Multiple years of experience across professional, personal, and academic settings. Use of best practice standards such as PEP8, type hinting, and environment management.

  • Analysed large amounts of data from medical devices and protein folding models using PySpark, Pandas, and NumPy
  • Visualised data for preliminary data exploration and testing heuristic models with Plotly, and Matplotlib
  • Built time-series machine learning classifier models and conducted novel experiments in deep learning robustness with Scikit-Learn, Keras, PyTorch, and fastai

MATLAB: Created a novel numerical algorithm to solve large, over-determined systems of linear equations in a third-year university project, this algorithm outperformed existing methods on some metrics. Writeup available here.