In this blog, we are gonna perform the analysis on the Meteorological data, and prove the hypothesis based on visualizations
The Null Hypothesis H0 is "Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming".



The H0 means we need to find whether the average Apparent temperature for the month of a month says April starting from 2006 to 2016 and the average humidity for the same period has increased or not. This monthly analysis has to be done for all 12 months over the 10 year period.
so lets start: -
## Importing libraries
import numpy as np ## for linear algebra
import pandas as pd ## for data manipulation and visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
data = pd.read_csv('weatherHistory.csv')
data.head()
data.info()
data.isnull.sum()
As it is clear from above that our desired features, Apparent Temperature, Humidity and Formatted Date has no null values. So no need to perform interpolation here.



Here our Formmated Date column is of an object type, hence first we will convert it into DateTime format.
follow step-
We have extracted year, month, and days from the Date attribute.
Now we have cleaned our data, it is time to move on to prove the null Hypothesis. That is to check if Apparent Temperature and Humidity has increased during the last 10 years due to Global Warming.



To prove the check the hypothesis, we will visualize variation the attributes yearly for each month. I am gonna use plotly and cufflinks libraries first to plot bar graphs.
from plotly.offline import iplot
import plotly as py
import plotly.tools as tls
import cufflinks as cf
py.offline.init_notebook_mode(connected=True)
cf.go_offline()
Now we have imported and connected our notebook with Plotly, so let's visualize the attributes to get some insights.
jan = data.loc[data['Month']==1]
jan.iplot(x="Year", y=["Humidity", "Apparent Temperature (C)"], kind="bar")
We can analyze from this plot that Humidity for January month for each year is constant, does not vary with the year. But Apparent Temperature shows variations year by year.
But this plot does not give us an exact idea about the variation measure in the Apparent Temperature.



So now we will first resample the desired attributes by its mean(average) and then will visualize our hypothesis.
data.set_index('Formatted Date', inplace = True)
data = data[["Humidity", "Apparent Temperature (C)"]].resample('MS').mean()
data.head()
As we can analyze there is not any change in humidity in the past 10 years(2006–2016) for the month of January. whereas Apparent temperature increases sharply in 2006 to 2007 and decrease in 2007 to 2010 and again increases in 2010 to 2014 but decrease in 2014 to 2016
This plot has given us a clear idea about the variations. So let's do the same plotting for each month as well.
fab & march-----
June & July


S














No comments:
Post a Comment