Data Scientist. Mathematics Master's (MMath) graduate from the University of Oxford.

Experience

Remunerated

(August 2024 - Present) Data Scientist - Ocado Technology

Working on the Automated Storage & Retrieval System team using data from robotics sensors.

(June 2022 - August 2024) Data Scientist - BIOStress

First full-time data science hire at an early stage startup, responsible for all the models and data pipeline.

  • Developed a novel stress detection algorithm with better accuracy, robustness, and generalisability than the state-of-the-art by using careful feature/model selection and signal pre-processing, leading to a patent submission
  • Designed scientific trial methodology for a study to be run in collaboration with the University of Bath to evaluate performance of my model
  • Deployed model to cloud and working in accordance to ISO 27001 principles
  • Experience with multimodal data; time-series, free text, and standardised tests
  • Managed an intern over 3 months, provided guidance and set projects to help develop his data science skills.

(July 2021 - September 2021) Research Intern - Oxehealth

10 week internship at a vision-based medical device company.

  • Gave insight to a new product area for the company (detecting Obstructive Sleep Apnea) by analysing medical datasets, creating a module to handle polysomnography data, and training time-series classifier models
  • Improved algorithm performance by building PySpark tools to audit large amounts of data leading to a scaleable way of identifying misclassifications
  • Experience working with sensitive medical data from care homes and NHS mental health trusts.

Volunteering

(January 2024 - Present) Statistician - Pulmonary Vascular Research Institute

Volunteering in association with the Royal Statistical Society as the primary statistician analysing data from a large-scale patient survey.

  • Working with academics from the University of Cambridge and third-sector organisations to publish papers in high-impact journals (manuscripts in progress).

(August 2024 - September 2024) AI Safety Engineer - Arcadia Impact, AISI

  • Implementing a multimodal, zero- and multi-shot benchmark (MathVista) to AISI's Inspect framework and evaluating the performance against existing and novel models (OpenAI's GPT-4 Turbo, GPT-4o, and GPT-4o mini) on the benchmark. Link to implementation and PR.

Education

(2018 - 2022) University of Oxford, Integrated Master's in Mathematics (MMath).

Master's thesis on the statistical behaviour of protein folding using data from computational models.

Areas of focus:

  • Statistics
  • Machine learning and deep learning
  • Numerical methods
  • Computational biology

(2016 - 2018) King's College London Mathematics School.

A-Levels: Mathematics (A*), Further Mathematics (A*), Physics (A)

AS-Levels: Computer Science - Python (A), Further Additional Mathematics (A)

Expertise

Python: 5+ years of experience across professional, personal, and academic settings. Use of standards such as PEP8, type hinting, and environment management. Selected libraries:

  • Data analysis and processing: Pandas, NumPy, SciPy, Polars, PySpark
  • Machine Learning: Scikit-Learn, TensorFlow, LangChain, PyTorch, Keras
  • Visualisations: Plotly, Matplotlib

Cloud: Set up end-to-end data pipelines from scratch and managed a migration from AWS to Azure

  • AWS: Batch, Lambda, S3, EC2, ECR, IAM
  • Azure: Batch, Blob storage, Container Registry, IAM
  • GCP: BigQuery, Looker

General software engineering: Working on large code bases, unit and end-to-end testing, version control & CI/CD (git), semantic versioning, Linux (Debian-based/Ubuntu - desktop & server), and Docker.

General data science: SQL, data processing (online, batch, and unified with Apache Beam), particular aptitude for quantitative analysis of time-series and sensor data, and building robust and explainable machine learning models.

Publications

  • J. Newman, S. Munagala, M. Fay, G. Fischer, M. Granato, L. Howard, M. Kurzyna, L. Macdonald, G. Meszaros, E. Otter, M. Stone, K. Bunclark, M. Toshner, M. Tschida, PVRI IDDI Patient Engagement & Empowerment Workstream, PH GPS Consortium, J. Pepke-Zaba. 2024. Pulmonary Hypertension Global Patient Survey: a preliminary overview. [Poster]. European Respiratory Society Congress 2024, 7 September - 11 September. Vienna, Austria.
    • Winner of European Respiratory Society & European Lung Foundation Travel Grant for Best Abstract in Patient-Centered Research
  • J. Newman, S. Munagala, M. Granato, M. Kurzyna, L. MacDonald, G. Meszaros, E. Otter, M. Stone, M. Toshner, M. Tschida, J. Pepke-Zaba. 2024. Pulmonary Hypertension Global Patient Survey: a preliminary overview. [Poster]. 7th World Symposium On Pulmonary Hypertension, 29 June - 1 July. Barcelona, Spain.
  • S. Munagala (inventor), T. Routledge (inventor), BIOStress Lab Ltd. (applicant) Measurement of Physical Stress Response. [Pending Patent] (Patent Application Number GB2402167.7) Patents Journal Number 7037, UK Intellectual Property Office, Lodged: 16 February 2024.