Seb Pretzer
Profile
Machine Learning Engineer turned Data Engineer. Used NLP techniques to extract clinical data from medical records, and CV techniques to guide diagnoses from H&E whole slide images. Currently building data pipelines to support modeling efforts of confocal microscopy images and other modalities.
Experience
Xilis
Durham, NC
Sr. Data Engineer
Aug 2022 - Present
- Operating pipelines in Argo to support modeling efforts of confocal microscopy images. This included indexing raw data, extracting features from images, and running models on the extracted features.
- Managing a data catalog to enable data scientists to discover and understand the data available to them, while maintaining version control.
- Wrapping frameworks to enable improved DevX, including a s5cmd python interface, a Poetry plugin for mirrored CodeArtifact repositories, and package to lock locally-dependent Poetry projects.
Tempus Labs
Chicago, IL
Sr. Machine Learning Scientist, Imaging
Oct 2021 - Aug 2022
- Developed a pipeline to extract MSI status from H&E whole slide images, to target patients who would benefit from immunotherapy (link).
- Supported the imaging team's infrastructure and tooling needs, standing up GCP environments, interacting with internal data APIs, etc.
Machine Learning Scientist, NLP
Jun 2019 - Oct 2021
- Leveraged elastic search and GCP Healthcare API to quickly scale medication extraction pipelines.
- Utilized NNs to extract clinical labels from patient level and document level data.
- Built packages to support faster pagination of data from s3 and calling SageMaker compute.
Data Science Intern, NLP
May 2018 - Jun 2019
- Build end-to-end deep learning models for patient level labeling.
- Established team's initial cloud-based training environment in AWS.
Northwestern University
Evanston, IL
WebSAIL Research Assistant
Jan 2018 - Apr 2018
- Explored alternative word embeddings and additional syntactic cues to improve embedding quality (link).
Teaching Assistant
Mar 2017 - Jun 2017
Education
BA, Computer Science
Northwestern University
Sep 2015 - Jun 2018
Specialised in machine learning and natural language processing.
Skills
- Fluent Languages
- Python
- SQL
- Conversational Languages
- Terraform
- Typescript
- Rust
- Frameworks
- Poetry
- Pydantic
- Jinja
- Pydantic
- Pytest
- Docker
- Snakemake
- AWS Technologies
- Athena
- Glue
- CodeBuild
- CodeArtifact
- Sagemaker
- Lustre FSx
- s3
- ECR
- Other Technologies
- Argo
- Github Actions
- GCP BigQuery
Interests
- Basketball
- Cooking
- Skiing
- Dogs
- House Music
- Web3