Quantifying Public Health: The American Fitness Index

Cover

Introduction

Obesity and heart disease have long-plagued the American people, with the Center for Disease Control estimating that one-in-four deaths in America each year are caused by heart disease, a number that comes out to around 610,000 annually. However, one metric alone is never enough to make meaningful conclusions, and with National Institutes of Health figures that estimate two-thirds of Americans are classifiable as overweight, a more comprehensive understanding of public health is clearly needed if the nation is going to make meaningful strides towards a future that sees the resolution of the problem that is the ever-growing American waistline.

Methodology

Data originally released by the American College of Sports Medicine and partially-published by the City of New Orleans attempts to quantify the overall health of the fifty largest metropolitan areas in the United States every year. The 100-point ascending scale takes into account a number of factors they believe contribute to the overall health of a city and its surrounding area with the goal of informing policy makers of the reality of public health in their areas.

A detailed explanation of how the ACSM assigns scores can be found in the Appendix

The data compares ten Metropolitan Statistical Areas (MSAs) located primarily in the Southeast of the country, a region notorious for its problems with public health. Location notwithstanding, the data is otherwise quite diverse, representing MSAs from seven states and including the United States average values as well. Further, high-scoring MSAs like Atlanta, Georgia are shown alongside low-scoring MSAs like Oklahoma City, Oklahoma; with a five-year time period of observation (2011 to 2015), enough public data exists to effectively observe and record recent trends through similarity matching.

Additionally, statistical analysis can be done to test theories that arise during observation. Since the data is collected over a reasonable time span and contains information from several different states, a number of relationships can be observed and investigated to determine their validity and data-based clustering can be done for a range of purposes.

Is there a way to predict performance on the ACSM AFI, and by extension, overall public health using this data?

Publicly-available data allows for anyone with access to the correct analytics tools to pursue answers to their own questions and convey that information to any audience. ATSD is developed to work within the Socrata framework used by government agencies to publish data, it is main tool for this project and calculations are done using the computational knowledge engine WolframAlpha.

Data

AllCity

Looking at an entire set of data at once is often unhelpful and overwhelming, but this visualization can be used to offer a wide lens through which to view what amounts to a large amount of data over a long period of time. The grey Benchmark Average line simplifies the visualization by establishing a standard and providing perspective to someone who may be otherwise unfamiliar with the scoring system.

2015

This visualization looks at Year 2015 data and highlights those cities performing below the National Benchmark Average.

For more information about using the ALERT Setting, see the Appendix

This visualization can track city performance throughout the observed time period, establishing binary clusters, those cities which performed above the benchmark average and those which performed below the benchmark average.

The Benchmark Average, and by extension alert threshold, is modified for each year:

Year 2011 (Benchmark Average: 42)

2011

Year 2013 (Benchmark Average: 43)

2013

Even year (2012 and 2014) data can be found in the Appendix

to observe trends in individual MSAs, finding an effective method to sort the data is needed. By organizing the data by city, chronologically, trends appear that are not as obvious in the first visualization:

ByCityByYear

See Appendix for an alternative display of the above data

Using the same data but instead focusing the layout of the visualization on the trend across the observed MSAs for a given year, a third configuration is needed:

ByYearByCity

Although the two charts are rendered almost identically with respect to the data, the key difference is how they are presented. Here graph is organized to show trends based on the year, and even though the same amount of data is still present, tracing patterns year-to-year is much easier than in the previous display. Notice that because data is only available for 2015 for Baton Rouge, Louisiana the remaining empty columns are still rendered for the sake of chronology.

Using the display setting, unneeded data can be masked to compare the best and worst performing MSAs over the observed period. Here, Oklahoma City, Oklahoma is the lowest-performing MSA and Raleigh, North Carolina is the highest-performing MSA based on averaged performance. Displayed next to one another, their absolute and relative differences can be underlined:

BestAndWorst

to make observations about the performance of one MSA over the observed time, a similar strategy can be used with a different method of visualization:

Miami

Or, to compare the results of data observed within one state, additions can easily be made to include a third entity:

Miami

The same can be done using the two Tennessee MSAs, Memphis and Nashville/Davidson, displayed here alongside the Benchmark Average value:

Additionally, cities that serve as state capitals can be used as a microcosm for the trends of the state itself:

capitals

The population of these metropolitan areas often accounts for a significant amount of the total population of the state. In the Atlanta Metropolitan Area for example, that number is as high as 57% of the roughly 10 million residents of the state. On average, in the observed states, the Metropolitan areas observed make up around 30% of the total population of the state, with the Raleigh MSA (population 1.3 million) being a low exception representing only 12% of the state population, in fact the Raleigh MSA is not even the largest Metropolitan Area in North Carolina. Charlotte, North Carolina, an MSA with a population of roughly 2.3 million residents that ranked 43rd according to 2015 AFI scores with an overall score of 37.4, ranks below the National Benchmark Average and significantly below its less-populous neighbor Raleigh.

This trend is further explored in the Analysis section.

The two highest performing MSAs, Raleigh, North Carolina and Atlanta, Georgia can also be displayed next to one another, and the Benchmark Average:

Best

Analysis

The American College of Sports Medicine has taken on the bold task of attempting to quantify the public health of the fifty most populous metropolitan areas in the country. Using a combination of measurable data and self-reported figures, the goal of the annual American Fitness Index report is to see cities make positive steps towards better public health.

Because of the ranked nature of the index, a city may see a measurable increase in its raw score and still lose position in the national ranking. Although competition is important, this facet of the index oversimplifies the root factors that contribute to public health and cause the true goal of the AFI to be lost if other factors are ignored, making a thorough examination of the data even more important.

to simultaneously analyze the ranking of each city and its individual performance, the following Box Chart graphic can be used:

Box

Here the individual performance of the city can be analyzed at the same time as its relative performance against other cities. The average score of each city over the course of the entire period is displayed along with its maximum and minimum values. Outlier data is loosely connected to the central box to indicate that it was atypical. When viewed in ChartLab, a detailed breakdown of the data is visible when hovering the cursor over a box or its features.

Average performance of the observed cities can also be displayed with a less ambiguous visualization using the Gauge Chart that shows subjective performance standards, here the threshold has been set at 42 to represent the National Benchmark Average value, although this tool is also capable of handling active data sets and sending subscribers alerts when a certain threshold value is crossed.

AtlantaAvg

Learn more about Gauge controls and explore the results of other MSAs in the Appendix.

Above, when examining capital cities, the case of the Raleigh MSA and Charlotte MSA is observed. That is, two cities in one state with vastly different population sizes and vastly different scores on the American Fitness Index rankings. Is there a correlation between population size and score on the AFI?

The following table shows cities included in earlier data compared to cities not included, their Metropolitan populations, and also their score and rank on the 2017 American Fitness Index report at:

State	Metropolitan Statistical Area	Population	AFI Score	Rank
California	San Francisco	7.7 million	73.3	3
California	San Jose	2.0 million	71.6	5
California	San Diego	3.1 million	65.6	10
California	Sacramento	2.2 million	63.3	11
California	Los Angeles	12.8 million	55.7	16
California	Riverside	3.4 million	44.5	37
Florida	Tampa Bay	3.0 million	54.1	19
Florida	Miami	5.0 million	52.6	23
Florida	Orlando	2.2 million	52.3	25
Florida	Jacksonville	1.5 million	46.0	35
New York	New York City	23.7 million	54.5	18
New York	Buffalo	1.1 million	52.5	24
Tennessee	Nashville	1.8 million	36.8	42
Tennessee	Memphis	1.3 million	33.2	45
Texas	Austin	2.0 million	61.2	12
Texas	Dallas	6.4 million	43.2	38
Texas	Houston	6.4 million	39.0	40
Texas	San Antonio	2.4 million	34.7	44

Using data from the above table, the average size, score, and rank plus standard deviation of those figures for each city can be calculated. Those numbers are shown below:

State	Average Population	Average Rank	Average Score	Standard Deviation of Population	Standard Deviation of Rank	Standard Deviation of Score
California	5.2 million	14	62.3	4.2 million	12	10.8
Florida	2.9 million	26	51.3	1.5 million	7	3.6
New York	38.1 million	21	53.5	20.4 million	4	1.4
Tennessee	1.6 million	44	35.0	0.4 million	2	2.5
Texas	4.3 million	36	44.5	2.4 million	15	11.6

Average population values represent the average population of observed MSAs, not for the entire state.

Averages of the above figures establish standards:

Avg SD of Population	Avg SD of Rank	Avg SD of Score
5.78	8	6.0

The data shows that in fact, there is very little correlation between the size of a given city and its likelihood to have a certain AFI score. The hypothesis suggested by the Raleigh/Charlotte example seemed to indicate that two cities in one state of vastly different sizes score very differently on the AFI, with the smaller Raleigh MSA having a much higher rank and score than the larger Charlotte MSA. This relationship is shown to be the exception and not the norm: The trend supported by the data shows that cities within one state, regardless of their population size, seem to score more closely to one another than cities from different states but similar population sizes. Compare the Nashville MSA (population 1.8 million) to the San Jose MSA (population 2.0 million), the Orlando MSA (population 2.2 million), and the Austin MSA (population 2.0 million) for example:

State	Metropolitan Statistical Area	Population	AFI Score	Rank
California	San Jose	2.0 million	71.6	5
Florida	Orlando	2.2 million	52.3	25
Tennessee	Nashville	1.8 million	36.8	42
Texas	Austin	2.0 million	61.2	12

Now perform the same calculations as shown above:

Average Population	Average Rank	Average Score	Standard Deviation of Population	Standard Deviation of Rank	Standard Deviation of Score
2.0 million	21	55.5	0.2 million	16	14.7

Or use a larger population as the control amount:

State	Metropolitan Statistical Area	Population	AFI Score	Rank
California	San Diego	3.1 million	65.6	10
California	Riverside	3.4 million	44.5	37
Florida	Tampa Bay	3.0 million	54.1	19
Texas	San Antonio	2.4 million	34.7	44

Following the above procedure once again:

Average Population	Average Rank	Average Score	Standard Deviation of Population	Standard Deviation of Rank	Standard Deviation of Score
3.0 million	28	49.7	0.4 million	16	13.2

In both of these examples, where cities with similar population sizes were consciously selected to test the Raleigh/Charlotte hypothesis, the standard deviation of their rank is 16, or twice the value of the average standard deviation shown by cities when controlled for location. That contrast is even more vivid when comparing the raw score numbers, with size-controlled data showing a standard deviation of 13.2 and 14.7, both more than double the average standard deviation of 6.0 for location-controlled data. Uncertainty values for similarities within a given state being significant are calculated below, the expected value is the average value of the AFI score in the given MSA state:

State	Metropolitan Statistical Area	AFI Score (O1)	State Average (E1)	X^2
California	San Francisco	73.3	62.3	0.0312
California	San Jose	71.6	62.3	0.0223
California	San Diego	65.6	62.3	0.0028
California	Sacramento	63.3	62.3	0.0003
California	Los Angeles	55.7	62.3	0.0109
California	Riverside	44.5	62.3	0.0816
Florida	Tampa Bay	54.1	51.3	0.0197
Florida	Miami	52.6	51.3	0.0006
Florida	Orlando	52.3	51.3	0.0004
Florida	Jacksonville	46.0	51.3	0.0107
New York	New York City	54.5	53.5	0.0006
New York	Buffalo	52.5	53.5	0.0003
Tennessee	Nashville	36.8	35.0	0.0926
Tennessee	Memphis	33.2	35.0	0.0926
Texas	Austin	61.2	44.5	0.1408
Texas	Dallas	34.7	44.5	0.0485
Texas	Houston	39.0	44.5	0.0153
Texas	San Antonio	43.2	44.5	0.0009

Because the rank of an MSA is relative to the performance of other MSAs, it has been excluded from uncertainty testing, only the raw AFI score of the city is considered.

The uncertainty value for each state is considered individually:

State	X^2 Total	DoF	P value
California	0.1491	5	< 0.001
Florida	0.0314	3	< 0.005
New York	0.0009	1	0.025
Tennessee	0.1851	1	> 0.10
Texas	0.2055	3	0.025

The failure of Tennessee to conform to the scoring model is noticeable above when comparing MSA performance to the Baseline Average in the Data section. With such a high degree of uncertainty for that state, and the Charlotte/Raleigh case, only about a third of the states tested here conform to the standard, however, in a binary classification model as demonstrated here, the requirement is to determine one of two solutions: Do MSAs in a particular state conform to the general public health standards of other MSAs in a particular state?

Since this type of profiling seeks to determine the likelihood of other cities in a given state to follow the public health trends observed and recorded in public databases, highly uncertain or irregular data merely suggests that, no, the AFI score of a given MSA in a particular state cannot be predicted based on the AFI scores of other MSAs in the same state and a different model, or different data, is needed for that state.

Conclusions

The predictive model explored here indicates that there is a number of states where enough public data exists to effectively predict community health figures in MSAs where no such data is present, the ACSM only observes the fifty largest Metropolitan Statistical Areas in the country.

Using several standard methods of comparison, the data published by the City of New Orleans shows that most metropolitan statistical areas within one state follow a similar trend with respect to their public health. Using graphing tools shown in the Data section that portray this pattern in a number of states, and statistical calculation shown in the Analysis section that highlight the same observation certain hypotheses can also be rejected: the size of the population of a given MSA has no meaningful ability to predict that public health of the city, however there are states which can be modeled with known data to predict unknown data. Analysis of the political process and community engagement that may contribute to the existence or absence of such consistences is outside the scope of this experiment but well within the scope of the American College of Sports Medicine, as they go a step further and even provide custom-made action plans for each MSA, based on their performance in range of areas over a given period of time.

The data observed here is objectively quite large, and effective management and presentation of such data is crucial to drawing meaningful conclusions from it, which is why the ATSD is ideal for comprehensive and comprehensible solutions to a wide range of data science problems.

Contact Axibase with any support issues.

Action Items

Install Docker.

Download the docker-compose.yml file to launch the ATSD container bundle.

Launch containers by specifying the built-in collector account credentials used by Axibase Collector to insert data into ATSD.

export C_USER=myuser; export C_PASSWORD=mypassword; docker-compose pull && docker-compose up -d

Open ATSD web interface and begin exploring your data.

Appendix

How the AFI is Calculated

The American Fitness Index is calculated using the following formula:

x = [( \sum_1^n r w) / Max ] * 100

where:

x = Total Score
n = up to 15 for Personal Health or 16 for Community and Environment indicators are either present and counted, or not.
n = Metropolitan Statistical Area (MSA) Rank out of 50.
w = Weighted value of indicator, determined by ACSM internally.
Max = A hypothetical maximum score for the MSA ranked best on both indicators.

Source: http://www.americanfitnessindex.org/methodology/

Using the `ALERT` Setting

Alert Expressions use two-part syntax:

alert-expression = YOUR_CONDITION_HERE
alert-style = fill: COLOR; stroke = COLOR

And is shown in a ChartLab example below:

Alert

Even Year (2012 and 2014) By-City Data With the `ALERT` Setting

Year 2012 (Benchmark Average: 38)

2012

Year 2014 (Benchmark Average: 42)

2014

Alternative Display of City By Year Data

It may be more desirable to separate each body of data, for a cleaner visualization as shown below:

AltView

The visualization show above uses the group setting under the [widget] heading, as shown below:

Syntax1

Because of the highlighted setting, data is separated by location tag, but in the visualization shown in the Data section, the group = location tag is ignored because of the sort setting shown below:

Syntax2

Contact Axibase with any questions.

Modifying the Gauge to Display Other Cities

The tags.location setting at the bottom of the Editor specifies the observed MSA that you would like to view with the Gauge. Include the two-letter state abbreviation, and the \, escape notation. For a better presentation, change the title setting as well.

Gauge

# Quantifying Public Health: The American Fitness Index

# Introduction

# Methodology

# Data

# Analysis

# Conclusions

# Action Items

# Appendix

# How the AFI is Calculated

# Using the ALERT Setting

# Even Year (2012 and 2014) By-City Data With the ALERT Setting

# Alternative Display of City By Year Data

# Modifying the Gauge to Display Other Cities