The Uses and
Abuses of Economic Statistics
A
pet peeve of mine is the use of slipshod social science and statistics as a
mantle to conceal a weakly supported claim. I sometimes see this with the output of
ideological think tanks, organizations whose dissemination model usually
involves getting mainstream publishers to credulously disseminate their “reports,”
or press releases. Sometimes I feel
compelled to debunk such reports (e.g., Pittelli, 2016).
This
week I noticed an article in the Washington Post Wonkblog (Badger, 2016). This Wonkblog
article reported on an economic analysis from the Brookings Institution. It claims that a look at the changes in some
economic statistics for America’s 100 largest cities over the past five years shows
that economic growth does not do much to help the poor and working classes.
To
test the validity of this claim, I downloaded the statistics used by Brookings
and did some work on them in Excel and Tableau, an excellent visualization
program I am currently learning for a course in the Data Analytics program at
Southern New Hampshire University (SNHU.edu).
The
Brookings article pointed out that “On average, the faster a metro economy
grew, the more likely it was to experience improvements in inclusion
[Brookings’ term for how well the poor are doing]” but then went on to refute a
ridiculous strawman: “Yet growth in metro economies did not reliably improve
all residents’ economic fortunes.” Perhaps
more important, the article’s headline was “In metro areas, growth isn’t
reliably trickling down.” (Berube, 2016)
The
Washington Post Wonkblog picked up
the story with an even drearier headline (“All the people being left behind in
America’s booming cities”). The article disapprovingly
quoted various business and Republican sources claiming that economic growth is
the best way to help the poor and working class, told us that the Brookings
report shows they are all mistaken, and ended by quoting an author of the
Brookings report telling us that the report shows that the key to improving
“inclusion” is increased government spending on the poor.
The
Data
So
what was all this based on? Brookings took
nine economic statistics, grouped them together in three groups of three, and gave
the three statistics groups names which sound meaningful and important, namely:
“growth,” “prosperity” and “inclusion.”
As these coinages are idiosyncratic, I will continue to put quotes
around them. In addition, Brookings’ and
Wonkblog’s pessimistic reading of the
Brookings statistics (primarily, that “growth” is not reliably leading to
“inclusion”) is overblown for both statistical reasons and because the
Brookings’ coinages are not meaningful or wellconstructed. I have three major issues with their
conclusions:
First Issue
The
Brookings “growth” measure covers the size of each city’s economy, whereas the
“prosperity” and “inclusion” measures cover the per capita economy. Naturally, growing cities attract workers
from other cities with slower growth rates, and these workers – failed by their
previous cities of residence – also benefit from a successful city’s growing
economy. But the positives of a city
attracting new workers are overlooked by most of these statistics. Indeed, to the extent a city is attracting
new workers, its “prosperity” and “inclusion” measures will lag its “growth”
measure, but these discrepancies are not a measure of urban failure, but rather
of urban attractiveness.
Second Issue
Brookings
claims of its “inclusion” statistic that:
Inclusion indicators
measure how the benefits of growth and prosperity in a metropolitan economy are
distributed among people. Inclusive growth enables more people to invest in
their skills and to purchase more goods and services.
But
these claims are not reasonably supported by the three statistics in question.
Two
of the parts of “inclusion” are Median wage and the Employmenttopopulation
ratio (the share of all individuals aged 18 to 65 who are employed). Neither of these measures tell us much about
the bottom tier or working class or noncollege graduates. Median wage is a useful statistic, showing
how the middle is doing. Employmenttopopulation
ratio is also meaningful, and it is perhaps troubling that today the level
nationally is close to a 30year low.
But people also can be unemployed due to prosperity, in the case of
couples who can afford to have a stayathome parent, or people retiring before
age 65.
The
last of the statistics making up “inclusion” is the “Relative income poverty
rate” (RIPV), which is “The share of people in a metropolitan economy who earn
less than half of the local median wage.”
If a city saw everyone’s wages double, with no other changes, then RIPV would
be unchanged. But the lowearning people
would certainly benefit from a doubling of real earnings, and they would be
better able “to invest in their skills and to purchase more goods and services.” Like other inequality measures, this one
shows negative numbers when better off people see growth in their incomes, even
when the people at the bottom are seeing the same or somewhat better
incomes. But this measure of inequality
is worse than some others because the “welloff” whose income growth definitionally
becomes a bad thing are merely those at the 50^{th} percentile, not
some category of rich which is divorced from “the people.” Also, the RIPV statistic only looks at people
with earnings, which means that someone going from no earnings (e.g.,
unemployed, on welfare, or in prison) to low earnings makes their city look
worse off. Further, a lowincome person
forced to move because he is priced out of, say, San Jose, California, makes
that city look better off.
A
more meaningful measure of inclusion, or how “benefits… are distributed” to the
poor or working class, would look at a group such as the bottom quintile, and
would measure whether this lowestearning portion of the people saw increases
or decreases in income (or consumption).
In the absence of such data, the median wage tells us more about the
average person’s economic benefits and ability “to invest in their skills and
to purchase more goods and services” than does Brookings’ “Relative income
poverty rate.”
Third
Issue
One
cannot say flatly that a rising tide lifts all boats, or that it doesn’t; such
a reality falls along a continuum. I
downloaded the three Brookings ranks for each of the100 cities, used Excel to
semiautomatically put the tabular data into rows, and made scatter graphs in Excel
and then Tableau. I found that there is
indeed a positive correlation between Brookings’ ranks of 5year “growth” and
“inclusion” measures, with an rsquared of 11%, meaning that 11% of the
variation in the cities’ change of rank in “inclusion” may be explained by the
variation in the cities’ change of rank in Brookings’ “growth” measure (P <
0.001).
Below
is a scatter graph I constructed in Tableau using the Brookings rank data. It shows the same dots as the scatter graph shown
in the Wonkblog and Brookings articles,
but with the addition of city names, where Tableau found room for them (note
that the scales are reversed, as 100 is the worst score, and 1 is the best):
A
quick glance at the scatter graph does not show any obvious pattern of correlation,
and Wonkblog describes it as a “weak
relationship.” The article goes on to
say that “This nonpattern is notable precisely because the risingtide theory
remains so alluring, particularly among Republicans.” Those foolish Republicans! It may be reasonable to describe 11% as a
weak correlation, but it is certainly not a “nonpattern,” not evidence with
which to refute people who discern the pattern, and in particular not evidence
that some other policies would work better than policies aimed at improving
economic growth.
Note
on Nonparametric Statistics
The
cities are listed and graphed above by rank, not by the actual underlying
statistics. A list of ranks by
definition has rank or ordinal scale, but not interval scale (i.e., adjacent
cities always have a rank difference of one, but are not equally far apart from
each other in terms of the underlying statistics.) For normal statistical measures, such as
those underlying the ranks, one would expect something close to a normal
distribution, and that the interval between two adjacent cities which are
ranked very high or very low would generally be greater than the interval
between two adjacent cities near the middle of the distribution. (Imagine that we have 100 people chosen at
random, arranged by height; we are almost certain to see a greater height
difference between the tallest person and the secondtallest person than
between two adjacent people near the middle of the line.)
A
correlation of ranks is not a rigorous measure like a correlation of the
underlying data, the underlying statistics could show
significantly different correlations, and given
only rank data, a mathematical purist should prefer to use nonparametric
statistics. A person with little
knowledge of statistics might also prefer the simplest or crudest of these
nonparametric methods to determine correlation. Looking again at the scatter graph, it is
visually divided into 4 quadrants, with 50 cities on either side of center, and
50 cities each above and below the center.
Each quadrant will have 25 cities if there is zero correlation by this
measure. But in fact, the quadrant
counts are:
UL
= 18

UR
= 32

LL
= 32

LR
= 18

With
64/36 times as many cities at bottom left and upper right than at upper left
and bottom right, there is clearly a positive correlation between the two
variables. The (simple and crude) quadrant
count ratio is n(LL) + n(UR)  n(UL)  n(LR) all divided by N, and gives a
number similar to r (the Pearson productmoment correlation coefficient),
ranging from 1 to 1. In this case: (64
– 36) / 100 = 0.28, which is, at the least, on the stronger side of “weak
relationships.”
Further,
one can see in the scatter graph that there are no cities very near to the
upper left and bottom right extreme corners of the whole graph, while there are
a few cities very near the bottom left and upper right corners. In other words, cities with a poor ranking
for “growth” also have a poor ranking for “inclusion,” while cities with an
excellent ranking for “growth” also have an excellent ranking for “inclusion.”
All
of the preceding correlations analysis is based on the assumption that the
Brookings “growth” and “inclusion” statistics are meaningful and wellnamed
constructs. But as I noted in my First
and Second points above, this is not the case.
So how would I show the relationship between economic growth and
benefits to the people?
From
the Brookings report, I obtained the nine separate statistics for each of the
100 largest metropolitan areas in the United States. These are all rates of increase/decrease for
the last 5 years, the period emphasized in the Wonkblog article. Note that
I will use the Greek delta symbol Δ to denote change in a statistic, in this
case change over the last 5 years expressed as a percentage (e.g., if a
statistic increased by 10%, then it was multiplied by 1.10).
After
cleaning up the data and putting it in row format in Excel, I noted the Pearson
coefficients and rsquared figures for the pairings of these statistics.
So
how much does economic growth in a city help the poor? Just looking at the cities’ Δ Gross Domestic
Product (GMP) as the proper measure of economic growth, we see:
·
an
rsquared of 0.67 with Δ Aggregate Wages
·
an
rsquared of 0.56 with Δ Jobs
·
an
rsquared of 0.32 with Δ Average Wage
·
an
rsquared of 0.25 with Δ Median Wage
I
switched to Tableau at this point because it makes it easy to create a
calculated field combining statistics and then to check a correlation with the
calculated field.
Aggregate
Wages is by definition equal to Average Wage * Jobs, which means that (1+ Δ Aggregate
Wages) = (1+ Δ Average Wage) * (1+ Δ Jobs).
In
other words, if a city’s Average Wage increases by 10% while its number of Jobs
increases by 5%, then its Aggregate Wages increase by 15.5%. (We multiply a 10% increase in one statistic
by a 5% increase in a second statistic by multiplying 1.10 by 1.05 = 1.155 = an
increase of 15.5%.)
It
is no surprise that again looking at the cities’ Δ Gross Domestic Product
(GMP), we see:
 an rsquared of 0.67 with Δ Average Wage * Δ Jobs – exactly the same as for Δ Aggregate Wages, as we should expect
 an almostashigh rsquared of 0.61 with Δ Median Wage * Δ Jobs
Furthermore,
the beta or slope of the regression line is 0.82 for Δ Average Wage * Δ Jobs
and 0.85 for Δ Median Wage * Δ Jobs
In
other words, if one city has a GMP increase that is 10% higher than a second
city, that first city will have Aggregate Wages growing on average 8.2% higher
than the second city, and 67% of the variation in the growth in the 100 cities’
Aggregate Wages will be explained by growth in GMP alone.
And
when one city has a GMP increase that is 10% higher than a second city, that
first city will have (Median Wage * Jobs) growing on average 8.5% higher than
the second city, and 61% of the variation in the growth in the 100 cities’
(Median Wage * Jobs) will be explained by growth in GMP alone. Note this scatter graph of change in (Median
Wage * Jobs) vs. change in GMP:
So
it seems to me to be the best way to describe the way in which growth in a
city’s GMP brings growth in Wages and Jobs is “quite reliably.” Sometimes there is more growth in Wages and
sometimes more growth in Jobs. But as
noted above, in the latter case people in that city have also benefited, either
because the Employment ratio is higher than it would have been, or because
people migrated to the city and obtained a job they presumably could not have
gotten in the city they left behind.
Conclusion
How
did Brookings go so far afield from reality?
The Brookings authors used questionable methods to combine and create
statistics, and questionable methods to compare them.
I
think one reason why the Brookings authors prefer to coin terms for their
arbitrarily chosen groups of statistics, when there are singular statistics
which would are more meaningful and give more meaningful correlations, is that
controlling the terms gives them control of the narrative and makes refutation
more difficult and more complicated.
The
more complicated a statistical measure, the easier it is to fool oneself (or
others) about the meaning of the statistic.
More complicated statistics also allow for more options for comparing
the statistics, and more chances of finding what you want to find in a
correlation or other comparison. In this
case, Brookings was looking at correlations of the statistics which they termed
“growth” and “inclusion.” Each of these
statistics was formed by:
 Ranks,
 of Sums,
 of differences from the three different Means,
 divided by three different Standard Deviations,
 of Rates of change in,
 underlying statistics which were themselves, in some cases, the quotient of two statistics (i.e., one statistic divided by another).
These
manipulation of the statistics could be defensible if they were used for other
purposes, but they made Brookings’ scatter graph and related claims about
correlation untenable. Brookings
succeeded in creating a scatter graph which appeared to show little correlation
between “growth” and “inclusion.” But
even for their graph, a simple quadrant count in fact showed a fairly
significant correlation.
If
we skip the Ranks and the Means and Standard Deviations steps above, and create
simplified “growth” and “inclusion” measures by a simple averaging of the three
substatistics for each, we will find that “inclusion” has an rsquared of 0.21
against “growth,” twice as high as Brookings showed in its graph after using
Ranks/Sums/Means/Standard Deviations.
Note the “Simplified ‘growth’ and ‘inclusion’” scatter:
Finally,
after taking into account the problems with Brookings’ “inclusion” measure – it
has little to do with “how the benefits of growth and prosperity in a
metropolitan economy are distributed among people” and even less to do with how
able the cities’ people are to “invest in their skills and to purchase more
goods and services” – we were able to see that a proper correlation of more
straightforward statistics shows that a city’s growth in fact reliably drives
its Median Wage and number of Jobs.
References
Badger, E.
(2016, February 2). All the people being left behind in America’s booming
cities. Washington Post Wonkblog.
Retrieved from
www.washingtonpost.com/news/wonk/wp/2016/02/02/allthepeoplebeingleftbehindinamericasboomingcities/
Berube, A. (2016,
January 29). In metro areas, growth isn't reliably trickling down. Retrieved
from www.brookings.edu/blogs/theavenue/posts/2016/01/29growthisntreliablytricklingdowninmetroareasaberube
Brookings.
(2016). Metro monitor. Retrieved from www.brookings.edu/research/reports2/2016/01/metromonitor#V0G37980
Pittelli, D.
(2016, JanuaryFebruary). Cambridge 02138 – Letters to the Editor. Harvard Magazine. Retrieved from
http://harvardmagazine.com/2015/12/cambridge02138
Sullins, B.
(n.d.). Enterprise business intelligence with Tableau Server. Pluralsight. Retrieved
from
https://app.pluralsight.com/library/courses/enterprisebusinessintelligencetableauserver/tableofcontents