Hello World!

Welcome to the hompage of Jonathan Waring

I'm currently a Master of Science candidate at Harvard University's T.H. Chan School of Public Health studying Health Data Science. I recently graduated from the University of Georgia with a B.S. in Computer Science, along with a minor in Public Health and a certificate in Applied Data Science.

My inquisitive nature and desire to find logical reasoning and causation for the way things are immediately drew me to the field of data science and how it combines statistics and programming to find underlying patterns in data. For me, applying data science to healthcare in order to help prevent the spread of infectious disease, improve health outcomes, and analyze trends in the healthcare industry that affects the well-being of everyone is what makes it exciting.

Things I Can Do

Want to know what I am capable of doing? Check out some of my skills below!

  • Code (Python, R, Java, C++/C)
  • Databases (SQL)
  • Data Wrangling (pandas, tidyverse)
  • Data Visualization (matplotlib, ggplot2)
  • Machine Learning (sklearn, caret, H2O)
  • AutoML (auto-sklearn, Auto-Keras)
  • Deep Learning (TensorFlow, Keras)
  • Natural Language Processing
  • Computer Vision
  • Network Graph Analysis (NetworkX)
  • Statistical Analysis (R, JMP)
  • Web Dev (HTML 5, CSS 3, JavaScript)
  • Google Cloud Platform (GCP)
  • Git/GitHub
  • Linux, Unix, Windows, macOS
  • JupyterLab
  • Science Communication
  • Full Resume

Some Stuff I've Done

Here you'll find a sample of some of the previous data science-y projects that I have done in the past. Some of these were done for a class project, some for research, and some just for fun!

Interpretability in Machine Learning for Epidemiological Forecasting

In this project, we attempt to improve the interpretability of machine learning methods for epidemiological forecasting by evaluating whether or not machine learning models pick up on known spatiotemporal patterns of influenza spread.

This project was done in collaboration with Emily Aiken as a final project for APCOMP 221 (Critical Thinking in Data Science) at Harvard University.

Predicting Hospital Readimission

In this project, we attempt to develop a machine learning strategy that allows one to predict 30 day hospital readmission of diabetes patients using electronic health record data?

This project was done in collaboration with Selena Huang, Erica Moreira, and Jacob Rosenthal as a final project for BST 260 (Data Science I) at Harvard's T.H. Chan School of Public Health.

Automated Sarcasm Detection of Reddit Comments

In this project, we attempt to develop a machine learning strategy using natural language processing (NLP) techniques to identify sarcasm in Reddit comments. Experimental results on the SARC dataset shows that by combining lexical and word embedding features, our best performing model achieves a testing accuracy of 61.24% and a F1 score of 0.605.

This project was done in collaboration with Jonathan Hayne as a final project for CSCI 4360 (Data Science II) at the University of Georgia.

Identification of Vaccine Misinformation Online

The Internet plays a large role in disseminating vaccine misinformation to a large number of people, which contributes to the vaccine hesitancy problem. I attempt to develop a machine learning strategy using natural language processing (NLP) that allows one to identify misinformation in vaccine-related webpages. This was accomplished through the use of the low-dimensional document embedding algorithm, Doc2Vec, and the use of semi-supervised learning techniques.

This project was conducted under the direction of Dr. Shannon Quinn as a CURO research project at the Univerity of Gerogia.

Epidemiological Data: Parameter Estimation and Pitfalls

During an outbreak, epidemiologically important parameters need to be quantified and estimated to better understand and potentially put in place timely response strategies. These include quantities such as the mean infectious period and the transmission potential of the pathogen. Fitting transmission models to incidence reports has become a standard way of attaining quick real-time estimates of these parameters. Cumulative incidence data (total number of infections to date) is often used rather than raw incidence (number of new cases in a defined reporting period). Evidence suggest this choice can critically affect our perception of the variability in parameters and hence the uncertainty in predictions. This project focuses on further elaborating on this problem using simulated epidemic data.

This project was conducted under the direction of Dr. Pejman Rohani and Dr. Ana Bento as a National Science Foundation REU project at UGA's Population Biology of Infectious Diseases REU site.

Blog

Want to hear some of my thoughts? Want to check out my science communication skills in action? Then you have come to the right place. Some of the work published here was originally published during my time as an undergraduate student at the University of Georgia on The Athens Science Observer. To read the full blog piece, just click on the cover image!

Also, keep in mind while reading these blogs that I am just one person: not statisically significant nor representative (little statistics humor for ya).

Effectiveness of Vaccines

Despite the effectiveness of vaccines being established for years, there is still some controversy about their safety and effectiveness. Using data, we can see that vaccines do appear to be effective and that their link to autism is unfounded.

Fake It Til' You Make It

Exploring my experiences with impostor syndrome and talk about how feelings of inadequacy are all too common amongst high achievers

Computers Detecting Sarcasm? Great!

If you think that humans have a hard time detecting sarcasm, imagine how difficult it must be for computers

Mo' Data Mo' Problems

Is the revolution of Big Data causing more problems than solutions?

Catholicism and Science

Discover whether religion, specifically Catholicism, and science can coexist (spoiler: they can!)

Can't Stop (Catching) the Feeling(s)!

In this Valentine's Day piece, I explore how a "no strings attached" relationship often ends with someone "catching the feelings"

Don't Believe Everything You Read

With numerous media outlets reporting on studies that are full of statistical mistakes, I explore why statistics education is so important

The Science of Blacking Out

Ever wonder why after a night of heavy drinking people can't "remember" what happened? What if they never formed a memory in the first place?

Why Computers Can't Do Everything

Despite all of the promising work in fields such as machine learning and AI, there are still some simple problems that computers can't solve. Why is that?

Internet Addiction: Fact or Fantasy

As the internet continues to provide more and more entertainment in the form of social media, video streaming, and online gaming, can someone become addicted to the Internet?