Field Cady

Data Scientist, Researcher, Author

Edmonds, WA

I am a data scientist, researcher, and author based in Edmonds, WA. I have worked on a diverse set of problems and try to solve them in the simplest way possible, but I have been specializing in stochastic modeling and machine learning (including deep learning).

Highlights

Technical lead for multiple DS / research teams. I like turning cool ideas into POCs and products 😊
Author 2x, with books translated into multiple languages
Personally built the analytics stack for Semantic Scholar
Stanford-educated physicist by training: research in stochastic analysis and mathematical biology

DeepFakes and Consulting

Building on my work at True Media I have been building and licensing datasets of DeepFakes to help with training / measuring DeepFake detection models. I monitor social media to identify new ways that people are making deepfakes, so that models can capture htem before they are widely used by scammers. I also have a patent-pending method for generating training datasets tailored to specific verticals.

Previously I have consulted on a range of other projects, such as:

Building data science teams
Behavior classification in oil wells
Analysis of biomedical images

Feel free to reach out if you'd like to discuss a project!

Books

The Data Science Handbook

A self-contained overview of Data Science, this book covers the math, programming and business. Currently in its 2nd Edition.

Also available in Chinese and Korean

Published by Wiley & Sons

Data Science: The Executive Summary

This is for people who don't personally want to do data science, but need to leverage it in their organization. It gives a broad overview of the tools and techniques of data science, including the technical depth needed to critique models, interpret analytical results yourself, and see through bullshit.

Also available in Chinese

Published by Wiley & Sons

What is Math?

This contains pretty much everything I have to say about math, cognition and language, as well as awesome historical context and personal anecdotes. If you are interested in the human side of math, then I encourage you to check it out.

Having spent most of my life working with math in one form or another, I am convinced that curious people of all backgrounds could benefit from a novel take on the subject. There are a lot of misconceptions out there, in everybody from math-phobes to professional researchers. Even if you don't end up agreeing with my thesis, the book covers a fascinating range of topics, and I think there will be something new and exciting for everyone.

Self-published

Selected Scholarly Articles

Open-system thermodynamic analysis of DNA polymerase fidelity

Blast from the past! This was written back when I was at UW. I show the critical and under-appreciated role that thermodynamics plays in the low mutation rate of DNA when cells divide.

A Stochastic Analysis of Hard Disks

I wrote this with people at CMU, and it calculates the average wait time for hard disks under certain assumptions. It turns out to be a very subtle problem; many previously published papers botched the math.

An Elementary Derivation of Mean Wait Time in Polling Systems

This paper, which I only put on ArXiv, generalizes the previous one to general polling systems.

Talks

A lot of the public stuff I do is at conferences where it doesn't get recorded. However, some of my work has found its way online.

Python for Data Science

Python is, IMHO, the best general-purpose programming language for data science. This talk gives some tips for how to get the most out of it.

Relational Algebra and the Pig Language

This talk gives an overview of relational algebra, which is the theoretical underpinning for most modern databases and most Hadoop wrapper languages. It's cool stuff, and worth being familiar with if you want a deeper understanding of these tools. Wow, I can't believe that I used to work with Pig - I feel like a dinosaur!

The Accidental Data Scientist

A talk I gave at the Metis bootcamp giving advice to people just about to start their careers as data scientists.

Other

CtHMM

A Python library I developed that supports continuous-time Hidden Markov Models. Basically it's HMMs but with irregularly-spaced observations - super useful in situations like medicine or customer interactions where observations arrive at irregular intervals, rather than a fixed schedule.

Patent US10162881B2

For machine-assisted discovery of join keys between different datasets. I led the team at Maana that developed this patent and integrated it into our production code.

Fun Projects

ESDM Therapist Finder

ESDM is a play-based therapy for kids on the autistic spectrum. unfortunately their website is very hard to navigate. I vibe-coded a map that lets you easily find therapists all over the world.

Mandarin Anki Cards

I am learning Mandarin, and I made this app to help me make flashcards. You just put in a big blob of Chinese text (for me it tends to be song lyrcs) and it will extract the words and make them into Anki-friendly CSV that you can just paste in to make flashcards. It uses Llama on Hugging Face for the actual translation.