top of page

Before discussing how the study was conducted it is important to know the origins of the data that were used. A variety of sources were used in order to get as much data as possible. Each is listed below:

  • US state’s shapefile --> ESRI data catalogue 

  • US Obesity percentages per state (2011)--> US Center for Disease Control and Prevention (CDC)

  • Various factors like the percentage of physically active adults per state (2009), state per capita expenditure on fast food (2007) and the number of recreation facilities (2009) per 1,000 inhabitants per state all came from the USDA food-environment atlas  (This source has much more data available that would be interesting for further study)

  • Energy usage per state (residential and transportation) 2011--> US Energy Information Administration (EIA)

    • Note: Although the Roberts & Edwards used fossil fuel data in their research, I chose to use residential and transportation energy consumption which may or may not be from fossil fuels.  Data on fossil fuels alone were not seperated by sector and would thus include info on commercial and industrial consumption that are not relevant to the current study.  As well, data for states that use other forms of energy (eg. nuclear) would be misrepresented with pure fossil fuel data.

  • Education info (percentage of adults with high school diplomas 2009) and urban vs. rural population data (2009)--> The United Sates Census Bureau

Data Sources:

Methodology:

        In order to answer the questions previously outlined, I conducted a series of analyses within ESRI’s ArcMap.  This program enabled me to add spatial components to my analysis that furthered understanding of the results.


OLS Regression:


       The primary tool of analysis employed within the study was Ordinary Least Squares (OLS) Regression.  OLS regression is a tool used for comparing variables and understanding correlations between them.  OLS consists of dependent or responsive variables, and independent or explanatory variables.  Within this analysis, the program determines the degree to which the explanatory variables explain the dependent variable.  Within my first OLS regression, I wanted to determine the correlation between per capita energy use and the percentage of adults physically active within each state.  Here I used energy usage as the dependent variable and the percentage of physically active adults as the independent variable.  In order to answer my other questions, I switched the dependant variable to obesity and used other variables like the percentage of physically active adults and per capita fast food expenditure as the explanatory variables.  
Once the OLS regressions had run, I used the output report files in order to determine the significance of the results.  The main indicator that I relied on was the adjusted R^2 value.  This value is a measure of model performance and represents the degree to which the independent variables explain the dependent variable (ArcGIS 2013).  Thus those regression analyses that produced higher adjusted R^2 values were of greater importance to understanding the causes of obesity in the US.

 

Morans I:

 

       It is important to note that OLS is a global method of regression, as opposed to a local method like geographically weighted regression.  This means that it creates a single equation to explain the relationship between variables over the entire study area (ArcGIS 2013).  It does not pay attention to differences across space and place and is thus a non-spatial method of analysis.  In order to combat this, I used the Moran’s I tool.  This assigns a z-score to each OLS analysis, denoting the level of spatial autocorrelation present amongst the results.  Those regressions that produced higher z-scores had strong spatial autocorrelation.

This indicated that there were other factors at play beyond those tested that were influencing the distribution of obesity, proving helpful for further analysis.  

      This tool also assigned a p-score.  This value signifies the overall significance of the model and is thus important when determining if you're model is valid.


Grouping Analysis:


    The other method used within this project was the grouping analysis tool.  This tool of analysis groups areas into clusters based on their similar features (eg. high obesity, low energy consumption, etc.).  This was used in order to understand the differing trends in various regions of the US and the different characteristics that compose the high and low obesity regions of the country.  In order to determine the settings of the grouping analyses (ie. how many groups do I want? Which produces a more accurate illustration of the data: no restrictions or k-nearest nighbours) I used a lot of trial and error.  After each grouping analysis I would examine the output report files paying particular attention to the minimum and maximum values within each group for each variable.  If there was large variation, I would conduct the analysis again in order to get more accurate clusters.

 

Below are a group of flow charts which better depict the process used to determine the results.

 

bottom of page