Exploring And Visualizing Climates Of Various Countries

In this Python tutorial we’ll use SQL to query a database we create along with making graphic visuals of the data using Plotly!

Creating our Database

Lets first make sure we have loaded in the necessary python packages for this.

#packages needed
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import sqlite3
from plotly.io import write_html

Now lets create our database!

#creating the database
conn = sqlite3.connect

We have just created an empty database. In order for that database to actually be useful to us we need to load in the appropriate data.

#reading in the csv files
#reading in our temperature data in chunks
temps = pd.read_csv("temps_stacked.csv", chunksize = 100000)
#writing to the temperatures table in our database. first we will iterate through to create a FIPS code column which
#will help us later on when querying
for temp in temps:
    temp["FIPS"] = temp["ID"].str[0:2]
    temp.to_sql("temperatures", conn, if_exists = "append", index = False)

#reading in our stations data
stations_url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/noaa-ghcn/station-metadata.csv"
stations = pd.read_csv(stations_url)
stations.to_sql("stations", conn, if_exists = "replace", index = False)

#reading in our countries data and renaming some columns which will come in handy later
countries_url = "https://raw.githubusercontent.com/mysociety/gaze/master/data/fips-10-4-to-iso-country-codes.csv"
countries = pd.read_csv(countries_url)
countries = countries.rename(columns = {"FIPS 10-4" : "FIPS", "Name": "Country"})
countries.to_sql("countries", conn, if_exists = "replace", index = False)

Congrats! We have just created out database. Now lets check to see if we did it correctly.

#checking to see if the tables are correct
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
print(cursor.fetchall())

#checking to see the columns of all our tables
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table';")
for result in cursor.fetchall():
    print(result[0])
conn.close()

Making a climate database querying function

Now that we have made our database how can we access certain data that we want? Well with this querying function we are able to do just that. We’ll be making a querying function that returns a panda dataframe telling us the temperature of the selected country and much more.

def query_climate_database(country, year_begin, year_end, month):
    """
    This function takes in a country, a starting year, ending year, and month.
    It then returns a pandas dataframe containg information such as station names and temperatures for the selected country.
    """
    conn = sqlite3.connect("climate.db")
    cmd = \
    """
    SELECT S.name, S.latitude, S.longitude, C.country, T.id, T.month, T.year, T.temp
    FROM temperatures T
    LEFT JOIN stations S ON T.id = S.id
    LEFT JOIN countries C ON T.fips = C.fips
    WHERE C.country == '{0}' AND T.year >= {1} AND T.year <= {2} AND T.month == {3}
    """.format(country, year_begin, year_end, month)
    climate = pd.read_sql_query(cmd, conn)
    climate.reset_index()
    conn.close()
    return climate

Lets see if we did this correctly with a test.

india = query_climate_database(country = "India",
                       year_begin = 1980,
                       year_end = 2020,
                       month = 1)
india.head()

alt text

Making a geographic scatter function

Now lets try making a geographic scatterplot function to better visualize our data on a map using plotly. For this we’ll need to import two more packages. Using the linear regression package from the sci-kit learn library we can also get the coefficients for the average change in temperature. Lets get started.

#the necessary modules
from sklearn.linear_model import LinearRegression
import plotly.express as px
import calendar

#our linear regression function
def coef(data_group):
    #X is a df
    x = data_group[["Year"]]
    #y is a series 
    y = data_group["Temp"]   
    LR = LinearRegression()
    LR.fit(x, y)
    return LR.coef_[0]

def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):
    """
    this function takes in a given country, starting year, ending year, month, minimum required years, and additional arguments for plot customization.
    it returns an interactive plotly data visualization with labels, titles, and more interactive information.
    """
    #get the specified dataframe
    df = query_climate_database(country, year_begin, year_end, month)
    #making sure stations that dont have the minimum amount of years aren't included
    df['num_obs'] = df.groupby('NAME')['Year'].transform(len)
    df = df[df['num_obs'] >= min_obs]
    #creating our new yearly increase column in our df
    coefs = df.groupby(["NAME", "Month","LATITUDE","LONGITUDE"]).apply(coef)
    coefs = coefs.reset_index()
    coefs.rename(columns={0: 'Estimated Yearly Increase'}, inplace = True)
    coefs = coefs.round(4)
    #making the figure
    fig = px.scatter_mapbox(coefs,
                        lat = "LATITUDE", 
                        lon = "LONGITUDE",
                        hover_name = "NAME",
                        color = "Estimated Yearly Increase",
                        color_continuous_midpoint = 0,
                        title = "Estimates of yearly temperature change in "+ calendar.month_name[month] \
                            +" for stations in "+ country +", years "+str(year_begin) + " - "+ str(year_end),
                        **kwargs)
    return fig

Now that we’ve done that lets try it out by getting the yearly temperature changes in January from India between 1980 and 2020.

#choosing the colormap
color_map = px.colors.diverging.RdGy_r

fig = temperature_coefficient_plot("India", 1980, 2020, 1,
                                   min_obs = 10,
                                   zoom = 2,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale=color_map)

fig.show()
write_html(fig, "india_plotly.html")

Since this is a function we created we can see this interactive geographic scatterplot with any country and time frame of our choosing.

Additional Visualizations

How does the average yearly temperature vary between two countries in a specific month?

def compare_two_countries(countries, year_begin, year_end, month, **kwargs):
    """
    Given the desired countries (tuple or list of length 2), starting/ending year for the 
    data (int), a specific month (int), and additional keyword 
    arguments to be passed in to px.line(), we will return 
    a faceted line plot, divided by country, over time for the average
    temperature in the specified month each year. 
    """

    #query two times, one for each country
    df1 = query_climate_database(countries[0], year_begin, year_end, month)
    df2 = query_climate_database(countries[1], year_begin, year_end, month)
    
    #concatenate dataframes on top of each other
    df = pd.concat([df1, df2], ignore_index = True)

    #group by year and country and compute their average temperature
    df = (df.groupby(["Year", "Country"])["Temp"].apply(np.mean)).reset_index()
    
    #rename temp column
    df = df.rename(columns = {"Temp" : "Temperature"})

    #create adaptive title
    title = f"{countries[0]} vs. {countries[1]} in 
    {calendar.month_name[month]}, Years {year_begin} - {year_end}
        <br> <sup> Average Temperature in °C</sup>" #adds newline and subtitle
    
    #lineplot
    fig = px.line(df, x = "Year", y = "Temperature",
                  facet_col = "Country", #what to facet on
                  facet_col_spacing = .09, #increase horizontal spacing between facets
                  title = title,
                  **kwargs)

    #change facet title to just be the country name
    fig.for_each_annotation(lambda a: 
        a.update(text=a.text.replace("Country=","")))      

    fig.show()

Example of the Line Plot:

conn = sqlite3.connect("temps.db")
compare_two_countries(("United States", "Canada"), 1990, 2010, 2,
                      template = 'plotly_dark')
conn.close()

What is the distribution of temperatures like for a country over time in a specific month?

def boxplot_over_time(country, year_begin, year_end, month, **kwargs):
    """
    Given the desired country (string), starting/ending year for the 
    data (int), a specific month (int), and additional keyword 
    arguments to be passed in to px.box(), we will return multiple box
    plots of temperature for each year in the specified month, with added
    hover data.
    """

    #query the data base
    df = query_climate_database(country, year_begin, year_end, month)

    #rename the temp column
    df = df.rename(columns = {"Temp" : "Temperature"})
    
    #make adaptive title 
    title = f"Temperature Distribution in {country} in {calendar.month_name[month]}, Year {year_begin} - {year_end}"

    #make the box plot
    fig = px.box(df, x = "Year", y = "Temperature",
                 notched = True, #makes indents around the mean
                 title = title,
                 hover_data = ["Country", "Month"], #added hover data on the outliers
                 **kwargs)

    fig.show()

Example of the Box Plot:

conn = sqlite3.connect("temps.db")
boxplot_over_time("Brazil", 2000, 2005, 3, template = 'plotly_dark')
conn.close()

Thank you for reading and following along!

Written on April 16, 2022