Is December getting warmer? Modeling weather data in NJ

A few years back I used to mentor data scientists in a bootcamp. One of the requirements was that the students complete a capstone project to demonstrate what they've learned. Many students struggled coming up with ideas for this, often trying to pick ambitious projects that they didn't now how to start. My advice was always the same:

Start simple. Take anything you are interested in, the list out the basic facts you believe about this, then start using data to provide evidence for these facts.

Despite living in the "information age", my experience is that we very rarely actually have any data to support some common assumptions we make. Almost always, when looking at the data, you'll find something that surprises you, and in the end you'll either have changed your beliefs slightly or have a much stronger foundation for them (sometimes both!).

In this post we'll take a look at the weather in December in NJ. To me it's certainly feels warmer than when I grew up, and I clearly see the evidence for climate change, but is it really getting warmer in NJ?

Winters in New Jersey

After many years living in other states I've recently returned to living in the state a grew up in: New Jersey. During this late Fall (2021) I noticed, what seemed to me, extremely warm weather, especially in December.

As someone who spends a lot of time exploring information related to climate change my immediate reaction is "of course it is getting warmer!" However, I find these type of quick causal assumptions to be the most dangerous. It is, of course, possible for climate change to be happening and also for the winters in New Jersey to currently be roughly the same as they were when I was a kid 30 years ago. Especially being in the middle of pandemic, and worrying a lot about the state of the world, it's easy for biases to creep into our daily observations and let our imaginations take over. Especially because it makes sense to me that winters are getting warmers, and it's something I expect to see, I should be extra cautious about confirming this is the case.

Getting the data

Thankfully it is very easy to get data to understand this problem better (though we'll soon see we still need models to understand what’s really going on). The National Oceanic and Atomspheric Administration (NOAA) provides extensive daily temperature readings for the nearby Newark Liberty International Airport.

Here we can see what the data looks like. The columns we are interested in are the date (which I have starting at 1940) and the daily temperature max, TMAX.

Example of the data we’ll be using to model NJ weather patterns.

Visualizing the data

The first thing to do is just look at what these values look like over time. Since I feel that this December is warmer, I expect to see some notable drift over the years. Let's see what this data tells us visually:

It’s not obvious to me that, visually, anything is happening here.

When I'm exploring data and I already know what I'm looking for I tend to be extra skeptical of what I see. For some people it may look like we see sloping upwards over time, but personally I don't see an obvious trend.

Next we can take another look at this data, this time overlapping each year to see how this year compares to past years:

Looking at the end of the year it does look like December is a bit warmer.

Looking specifically at the end of the year, which is the period in question, it does look a bit warmer this year than past years. But again, I want to be extra skeptical about what I'm seeing because I expect to see it warmer, and I really want the data to convince me. Honestly, looking at this plot doesn't do it.

In order to really see if something is happening over time we need to use a model. Modeling this data will tell us a lot more about things that might not be visible, and we can get statistical information about our claims so we aren't just going with a "feeling".

Modeling our New Jersey weather

We're going to start with a simple linear model. The basic hypothesis I have is that the temperature in New Jersey is increasing each year (specifically in December, but we'll get to that in a bit).

Modeling temperature change each year

Let's start by building a model that assumes that for each year from 1940 the average temperature has gone up by a fixed amount each year. Here’s the code for my basic model:

# get rid of the occasional nan values
w_df = weather_df[np.isnan(weather_df['TMAX']) == False]
X_df = pd.DataFrame(w_df['year'] - np.min(w_df['year']))
X_df['const'] = 1.0
y = w_df['TMAX']
year_model = sm.GLM(y, X_df).fit()
year_model.summary()

Results from our simple linear model assuming a constant increase in temperature each year.

Looking at this model we see that there is a year and a const. The const represents the average temperature at the start of our time line, and year represent the increase (modeling in years since 1940), in degrees Fahrenheit that temperature has risen each year.

This model makes the claim that each year the average temperature has risen by 0.0338 degrees Fahrenheit. This means on average we would have expected annual temperature averages in New Jersey to go up by 2.7F degrees since 1940.

The p-values and standard error here tell us that our model is pretty confident in these claims, but I always like to compare real data with the model. Here we can see what our simple model predicts for each year with the actual data:

With our simple regression we can see that it is getting warmer each year.

As we can see the model does seem to follow the data reasonably well.

Still a straight line doesn't quite seem answer the question I have which is "are Decembers warmer?" This model does say that throughout the year, on average, we see an increase in temperature, but maybe that all happens in July?

To answer the question I have we need to make two changes to our model.

Modeling individual months

The issue with our current model is that it has only one intercept, which is the year. But each month clearly has different average temperatures, so we should really have an intercept for each month.

By one-hot encoding (or creating "dummy" variables) we can create a distinct intercept for each month. When we add monthly values we can get a better look at what's happening:

X_df = pd.get_dummies(w_df['date'].dt.month, prefix="month")
month_vars = list(X_df.columns)
X_df['year'] = w_df['year'] - np.min(w_df['year'])
monthly_model = sm.GLM(y, X_df[month_vars+['year']]).fit()
monthly_model.summary()

Now we have an average temperature for each month and the average yearly change across all months.

Notice we're still getting the same value for the increase each year, but now we can see what the average temperature each month is. This is pretty useful information. We can now see that January is typically the coldest month in this part of New Jersey, while December is the 3rd coldest month. Maybe I feel that December is getting warmer because I remember how cold February is and when December rolls around again it feels warmer?

We can really see the improvements in our modeling process we get when we look at this model against the observed data:

We can see that this model is a much better fit of the data.

While this is a much better model, capturing more the monthly variation than the previous one, we still can't answer the question "Is it getting warmer in December?". To do that we need to model the interaction between month and year.

Modeling Month/Year Interactions

The hypothesis I want to model now is that each month of the year is changing each year in a different way. So far we still have only modeled annual changes but can't be sure if Decembers are getting warmer or if maybe just the Summer months are doing most of the heavy lifting here.

For our next model we will model the interactions of each month with each year. An interaction is just literally the month_var * year. This will allow us to model how each month changes each year. We'll also still be learning a year coefficient since this will let us know if there is a general trend across all months.

The output of this model is pretty big but we get some interesting results!

Month, year and month*year interactions are able to capture how much each month is changing over time.

We can read these new interaction coefficients as:

"the average rate each of these months changes each year since 1940"

First we have to notice something important about our p-values column: we are no longer confident in the way each month changes! With this in mind let's take a look at these observations.

To start with we have the year which says that, across all months, we are seeing an overall increase in 0.0312 F each month each year. The p-value here is 0 and the standard error is small, so this is fairly certain.

July Trends

If you look at July we can see a very high p-value. This means that in general we aren't really sure if July is changing much faster than the general trend of 0.0312. We can gain more insight into looking at the standard error which is quite low of 0.007. With a mean of -0.0012 this means that our beliefs in what the real difference for July could be all hover very close to zero. Even though we don’t have “significance” we still can see that it’s very likely July isn’t much different than our average annual increase. So each year each day in July should be roughly 0.03F warmer than the previous year.

October stays the same

Next let's look at October. October's coefficient is -0.0272 and the p-value is nearly 0 and standard error is low. To understand what this means we can get the add the year effect to the October effect to get 0.004. This means that in the 81 years of data we have, Octobers on average have only changed about 0.3F. This means that Octobers feel, more-or-less, exactly the same as they did in 1940.

Here is a great example of where data can change our view. The comedian Bo Burnham wrote a doomer anthem That Funny Feeling (the Phoebe Bridges version is, imho, even better). A line from that song is:

That unapparent summer air in early fall

Which is talking about the impacts of climate change on early fall. But here we see that if you do "feel" that October is warmer, at least in New Jersey, you are just feeling the impact of media bias or general climate anxiety. The summer, at least right now, is not creeping into early Fall (…well a bit in September).

December is a lot warmer.

Finally we get to our conclusion: that feeling that December is getting warmer is absolutely correct. The December interaction coefficient is 0.0327, and our p-value is nearly 0 and standard error very low. This means that December is warming at twice the rate as the average annual increase.

I moved to NJ as a kid in 1990. Which means that December winters are on average 2F warmer than they were when I first moved to this state.

Visualizing average change in temperature per month

By combining the average yearly change plus the estimate for interaction we can visualize (with uncertainty from standard error) how much each month is changing:

As we can see, December is warming much faster than the other months in Newark, NJ.

After this journey we can clearly see that the "funny feeling" that December is getting strangely warm is backed up by the data. We can even put error bars on it and know with strong stastical confidence that winter in New Jersey in 2021 is much warmer than I remember it being in 1990.

Conclusion

One of the reasons that I like to teach statistics is because I deeply believe that statistics are essential in understanding our every day world. This becomes especially true as that world begins to change rapidly.

It's easy to imagine someone panicking about warm Octobers because of climate change and someone arguing that Decembers are always the warmest part of the winter so there's nothing happening. Depending our your worldview you might have very different naive opinions of both of these hypotheses. It turns out they're both wrong when we look at the data.

It is also very important to recognize why we need models rather than just data visualization. Visualizations can be incredibly helpful and are an essential part of the modeling process, but without a model we are forced to go on feelings and our own way of reading a plot.

After doing this exercise I was surprised to see that October is more or less the same, and despite having a prior belief in warmer Decembers, surprised by the degree of this warming. Based on our model the average December day today in 2021 should be about 46F while the average day in March in 1940 was 48F. In 30 more years, assuming the same constant rate of change (which is a big assumption in a changing world) Decembers will feel like March did to my grandparents.

I've found that many people find detailed analysis into data like this can make them uncomfortable, since they'd often rather not know that things are, in fact, changing in a way that might be scary. For me, it's exploring the data that gives me some grounding in a quickly changing and increasingly uncertain world. It maybe disconcerting to realize how quickly December is warming, but it is also comforting to have a break from the cognitive dissonance that is part of mainstream discussions on climate change.

Support on Patreon

Support my writing on Patreon and gain access to the source code and video commentary for this article as well as access to much more of my writing!