Seb Pretzer

Profile

Machine Learning Engineer turned Data Engineer. Used NLP techniques to extract clinical data from medical records, and CV techniques to guide diagnoses from H&E whole slide images. Currently building data pipelines to support modeling efforts of confocal microscopy images and other modalities.

Experience

Xilis

Durham, NC

Sr. Data Engineer

Aug 2022 - Present

  • Operating pipelines in Argo to support modeling efforts of confocal microscopy images. This included indexing raw data, extracting features from images, and running models on the extracted features.
  • Managing a data catalog to enable data scientists to discover and understand the data available to them, while maintaining version control.
  • Wrapping frameworks to enable improved DevX, including a s5cmd python interface, a Poetry plugin for mirrored CodeArtifact repositories, and package to lock locally-dependent Poetry projects.

    Tempus Labs

    Chicago, IL

    Sr. Machine Learning Scientist, Imaging

    Oct 2021 - Aug 2022

    • Developed a pipeline to extract MSI status from H&E whole slide images, to target patients who would benefit from immunotherapy (link).
    • Supported the imaging team's infrastructure and tooling needs, standing up GCP environments, interacting with internal data APIs, etc.

      Machine Learning Scientist, NLP

      Jun 2019 - Oct 2021

      • Leveraged elastic search and GCP Healthcare API to quickly scale medication extraction pipelines.
      • Utilized NNs to extract clinical labels from patient level and document level data.
      • Built packages to support faster pagination of data from s3 and calling SageMaker compute.

        Data Science Intern, NLP

        May 2018 - Jun 2019

        • Build end-to-end deep learning models for patient level labeling.
        • Established team's initial cloud-based training environment in AWS.

          Northwestern University

          Evanston, IL

          WebSAIL Research Assistant

          Jan 2018 - Apr 2018

          • Explored alternative word embeddings and additional syntactic cues to improve embedding quality (link).

            Teaching Assistant

            Mar 2017 - Jun 2017

                Education

                BA, Computer Science

                Northwestern University

                Sep 2015 - Jun 2018

                Specialised in machine learning and natural language processing.
                photo of me

                Skills

                  • Fluent Languages
                  • Python
                  • SQL
                    Conversational Languages
                  • Terraform
                  • Typescript
                  • Rust
                    Frameworks
                  • Poetry
                  • Pydantic
                  • Jinja
                  • Pydantic
                  • Pytest
                  • Docker
                  • Snakemake
                    AWS Technologies
                  • Athena
                  • Glue
                  • CodeBuild
                  • CodeArtifact
                  • Sagemaker
                  • Lustre FSx
                  • s3
                  • ECR
                    Other Technologies
                  • Argo
                  • Github Actions
                  • GCP BigQuery

                Interests

                • Basketball
                • Cooking
                • Skiing
                • Dogs
                • House Music
                • Web3