The Uses and Abuses of Economic Statistics
A pet peeve of mine is the use of slipshod social science and statistics as a mantle to conceal a weakly supported claim. I sometimes see this with the output of ideological think tanks, organizations whose dissemination model usually involves getting mainstream publishers to credulously disseminate their “reports,” or press releases. Sometimes I feel compelled to debunk such reports (e.g., Pittelli, 2016).
This week I noticed an article in the Washington Post Wonkblog (Badger, 2016). This Wonkblog article reported on an economic analysis from the Brookings Institution. It claims that a look at the changes in some economic statistics for America’s 100 largest cities over the past five years shows that economic growth does not do much to help the poor and working classes.
To test the validity of this claim, I downloaded the statistics used by Brookings and did some work on them in Excel and Tableau, an excellent visualization program I am currently learning for a course in the Data Analytics program at Southern New Hampshire University (SNHU.edu).
The Brookings article pointed out that “On average, the faster a metro economy grew, the more likely it was to experience improvements in inclusion [Brookings’ term for how well the poor are doing]” but then went on to refute a ridiculous strawman: “Yet growth in metro economies did not reliably improve all residents’ economic fortunes.” Perhaps more important, the article’s headline was “In metro areas, growth isn’t reliably trickling down.” (Berube, 2016)
The Washington Post Wonkblog picked up the story with an even drearier headline (“All the people being left behind in America’s booming cities”). The article disapprovingly quoted various business and Republican sources claiming that economic growth is the best way to help the poor and working class, told us that the Brookings report shows they are all mistaken, and ended by quoting an author of the Brookings report telling us that the report shows that the key to improving “inclusion” is increased government spending on the poor.
So what was all this based on? Brookings took nine economic statistics, grouped them together in three groups of three, and gave the three statistics groups names which sound meaningful and important, namely: “growth,” “prosperity” and “inclusion.” As these coinages are idiosyncratic, I will continue to put quotes around them. In addition, Brookings’ and Wonkblog’s pessimistic reading of the Brookings statistics (primarily, that “growth” is not reliably leading to “inclusion”) is overblown for both statistical reasons and because the Brookings’ coinages are not meaningful or well-constructed. I have three major issues with their conclusions:
The Brookings “growth” measure covers the size of each city’s economy, whereas the “prosperity” and “inclusion” measures cover the per capita economy. Naturally, growing cities attract workers from other cities with slower growth rates, and these workers – failed by their previous cities of residence – also benefit from a successful city’s growing economy. But the positives of a city attracting new workers are overlooked by most of these statistics. Indeed, to the extent a city is attracting new workers, its “prosperity” and “inclusion” measures will lag its “growth” measure, but these discrepancies are not a measure of urban failure, but rather of urban attractiveness.
Brookings claims of its “inclusion” statistic that:
Inclusion indicators measure how the benefits of growth and prosperity in a metropolitan economy are distributed among people. Inclusive growth enables more people to invest in their skills and to purchase more goods and services.
But these claims are not reasonably supported by the three statistics in question.
Two of the parts of “inclusion” are Median wage and the Employment-to-population ratio (the share of all individuals aged 18 to 65 who are employed). Neither of these measures tell us much about the bottom tier or working class or non-college graduates. Median wage is a useful statistic, showing how the middle is doing. Employment-to-population ratio is also meaningful, and it is perhaps troubling that today the level nationally is close to a 30-year low. But people also can be unemployed due to prosperity, in the case of couples who can afford to have a stay-at-home parent, or people retiring before age 65.
The last of the statistics making up “inclusion” is the “Relative income poverty rate” (RIPV), which is “The share of people in a metropolitan economy who earn less than half of the local median wage.” If a city saw everyone’s wages double, with no other changes, then RIPV would be unchanged. But the low-earning people would certainly benefit from a doubling of real earnings, and they would be better able “to invest in their skills and to purchase more goods and services.” Like other inequality measures, this one shows negative numbers when better off people see growth in their incomes, even when the people at the bottom are seeing the same or somewhat better incomes. But this measure of inequality is worse than some others because the “well-off” whose income growth definitionally becomes a bad thing are merely those at the 50th percentile, not some category of rich which is divorced from “the people.” Also, the RIPV statistic only looks at people with earnings, which means that someone going from no earnings (e.g., unemployed, on welfare, or in prison) to low earnings makes their city look worse off. Further, a low-income person forced to move because he is priced out of, say, San Jose, California, makes that city look better off.
A more meaningful measure of inclusion, or how “benefits… are distributed” to the poor or working class, would look at a group such as the bottom quintile, and would measure whether this lowest-earning portion of the people saw increases or decreases in income (or consumption). In the absence of such data, the median wage tells us more about the average person’s economic benefits and ability “to invest in their skills and to purchase more goods and services” than does Brookings’ “Relative income poverty rate.”
One cannot say flatly that a rising tide lifts all boats, or that it doesn’t; such a reality falls along a continuum. I downloaded the three Brookings ranks for each of the100 cities, used Excel to semi-automatically put the tabular data into rows, and made scatter graphs in Excel and then Tableau. I found that there is indeed a positive correlation between Brookings’ ranks of 5-year “growth” and “inclusion” measures, with an r-squared of 11%, meaning that 11% of the variation in the cities’ change of rank in “inclusion” may be explained by the variation in the cities’ change of rank in Brookings’ “growth” measure (P < 0.001).
Below is a scatter graph I constructed in Tableau using the Brookings rank data. It shows the same dots as the scatter graph shown in the Wonkblog and Brookings articles, but with the addition of city names, where Tableau found room for them (note that the scales are reversed, as 100 is the worst score, and 1 is the best):
A quick glance at the scatter graph does not show any obvious pattern of correlation, and Wonkblog describes it as a “weak relationship.” The article goes on to say that “This non-pattern is notable precisely because the rising-tide theory remains so alluring, particularly among Republicans.” Those foolish Republicans! It may be reasonable to describe 11% as a weak correlation, but it is certainly not a “non-pattern,” not evidence with which to refute people who discern the pattern, and in particular not evidence that some other policies would work better than policies aimed at improving economic growth.
Note on Nonparametric Statistics
The cities are listed and graphed above by rank, not by the actual underlying statistics. A list of ranks by definition has rank or ordinal scale, but not interval scale (i.e., adjacent cities always have a rank difference of one, but are not equally far apart from each other in terms of the underlying statistics.) For normal statistical measures, such as those underlying the ranks, one would expect something close to a normal distribution, and that the interval between two adjacent cities which are ranked very high or very low would generally be greater than the interval between two adjacent cities near the middle of the distribution. (Imagine that we have 100 people chosen at random, arranged by height; we are almost certain to see a greater height difference between the tallest person and the second-tallest person than between two adjacent people near the middle of the line.)
A correlation of ranks is not a rigorous measure like a correlation of the underlying data, the underlying statistics could show significantly different correlations, and given only rank data, a mathematical purist should prefer to use nonparametric statistics. A person with little knowledge of statistics might also prefer the simplest or crudest of these nonparametric methods to determine correlation. Looking again at the scatter graph, it is visually divided into 4 quadrants, with 50 cities on either side of center, and 50 cities each above and below the center. Each quadrant will have 25 cities if there is zero correlation by this measure. But in fact, the quadrant counts are:
UL = 18
UR = 32
LL = 32
LR = 18
With 64/36 times as many cities at bottom left and upper right than at upper left and bottom right, there is clearly a positive correlation between the two variables. The (simple and crude) quadrant count ratio is n(LL) + n(UR) - n(UL) - n(LR) all divided by N, and gives a number similar to r (the Pearson product-moment correlation coefficient), ranging from -1 to 1. In this case: (64 – 36) / 100 = 0.28, which is, at the least, on the stronger side of “weak relationships.”
Further, one can see in the scatter graph that there are no cities very near to the upper left and bottom right extreme corners of the whole graph, while there are a few cities very near the bottom left and upper right corners. In other words, cities with a poor ranking for “growth” also have a poor ranking for “inclusion,” while cities with an excellent ranking for “growth” also have an excellent ranking for “inclusion.”
All of the preceding correlations analysis is based on the assumption that the Brookings “growth” and “inclusion” statistics are meaningful and well-named constructs. But as I noted in my First and Second points above, this is not the case. So how would I show the relationship between economic growth and benefits to the people?
From the Brookings report, I obtained the nine separate statistics for each of the 100 largest metropolitan areas in the United States. These are all rates of increase/decrease for the last 5 years, the period emphasized in the Wonkblog article. Note that I will use the Greek delta symbol Δ to denote change in a statistic, in this case change over the last 5 years expressed as a percentage (e.g., if a statistic increased by 10%, then it was multiplied by 1.10).
After cleaning up the data and putting it in row format in Excel, I noted the Pearson coefficients and r-squared figures for the pairings of these statistics.
So how much does economic growth in a city help the poor? Just looking at the cities’ Δ Gross Domestic Product (GMP) as the proper measure of economic growth, we see:
· an r-squared of 0.67 with Δ Aggregate Wages
· an r-squared of 0.56 with Δ Jobs
· an r-squared of 0.32 with Δ Average Wage
· an r-squared of 0.25 with Δ Median Wage
I switched to Tableau at this point because it makes it easy to create a calculated field combining statistics and then to check a correlation with the calculated field.
Aggregate Wages is by definition equal to Average Wage * Jobs, which means that (1+ Δ Aggregate Wages) = (1+ Δ Average Wage) * (1+ Δ Jobs).
In other words, if a city’s Average Wage increases by 10% while its number of Jobs increases by 5%, then its Aggregate Wages increase by 15.5%. (We multiply a 10% increase in one statistic by a 5% increase in a second statistic by multiplying 1.10 by 1.05 = 1.155 = an increase of 15.5%.)
It is no surprise that again looking at the cities’ Δ Gross Domestic Product (GMP), we see:
- an r-squared of 0.67 with Δ Average Wage * Δ Jobs – exactly the same as for Δ Aggregate Wages, as we should expect
- an almost-as-high r-squared of 0.61 with Δ Median Wage * Δ Jobs
Furthermore, the beta or slope of the regression line is 0.82 for Δ Average Wage * Δ Jobs and 0.85 for Δ Median Wage * Δ Jobs
In other words, if one city has a GMP increase that is 10% higher than a second city, that first city will have Aggregate Wages growing on average 8.2% higher than the second city, and 67% of the variation in the growth in the 100 cities’ Aggregate Wages will be explained by growth in GMP alone.
And when one city has a GMP increase that is 10% higher than a second city, that first city will have (Median Wage * Jobs) growing on average 8.5% higher than the second city, and 61% of the variation in the growth in the 100 cities’ (Median Wage * Jobs) will be explained by growth in GMP alone. Note this scatter graph of change in (Median Wage * Jobs) vs. change in GMP:
So it seems to me to be the best way to describe the way in which growth in a city’s GMP brings growth in Wages and Jobs is “quite reliably.” Sometimes there is more growth in Wages and sometimes more growth in Jobs. But as noted above, in the latter case people in that city have also benefited, either because the Employment ratio is higher than it would have been, or because people migrated to the city and obtained a job they presumably could not have gotten in the city they left behind.
How did Brookings go so far afield from reality? The Brookings authors used questionable methods to combine and create statistics, and questionable methods to compare them.
I think one reason why the Brookings authors prefer to coin terms for their arbitrarily chosen groups of statistics, when there are singular statistics which would are more meaningful and give more meaningful correlations, is that controlling the terms gives them control of the narrative and makes refutation more difficult and more complicated.
The more complicated a statistical measure, the easier it is to fool oneself (or others) about the meaning of the statistic. More complicated statistics also allow for more options for comparing the statistics, and more chances of finding what you want to find in a correlation or other comparison. In this case, Brookings was looking at correlations of the statistics which they termed “growth” and “inclusion.” Each of these statistics was formed by:
- of Sums,
- of differences from the three different Means,
- divided by three different Standard Deviations,
- of Rates of change in,
- underlying statistics which were themselves, in some cases, the quotient of two statistics (i.e., one statistic divided by another).
These manipulation of the statistics could be defensible if they were used for other purposes, but they made Brookings’ scatter graph and related claims about correlation untenable. Brookings succeeded in creating a scatter graph which appeared to show little correlation between “growth” and “inclusion.” But even for their graph, a simple quadrant count in fact showed a fairly significant correlation.
If we skip the Ranks and the Means and Standard Deviations steps above, and create simplified “growth” and “inclusion” measures by a simple averaging of the three sub-statistics for each, we will find that “inclusion” has an r-squared of 0.21 against “growth,” twice as high as Brookings showed in its graph after using Ranks/Sums/Means/Standard Deviations. Note the “Simplified ‘growth’ and ‘inclusion’” scatter:
Finally, after taking into account the problems with Brookings’ “inclusion” measure – it has little to do with “how the benefits of growth and prosperity in a metropolitan economy are distributed among people” and even less to do with how able the cities’ people are to “invest in their skills and to purchase more goods and services” – we were able to see that a proper correlation of more straightforward statistics shows that a city’s growth in fact reliably drives its Median Wage and number of Jobs.
Badger, E. (2016, February 2). All the people being left behind in America’s booming cities. Washington Post Wonkblog. Retrieved from www.washingtonpost.com/news/wonk/wp/2016/02/02/all-the-people-being-left-behind-in-americas-booming-cities/
Berube, A. (2016, January 29). In metro areas, growth isn't reliably trickling down. Retrieved from www.brookings.edu/blogs/the-avenue/posts/2016/01/29-growth-isnt-reliably-trickling-down-in-metro-areas-aberube
Brookings. (2016). Metro monitor. Retrieved from www.brookings.edu/research/reports2/2016/01/metro-monitor#V0G37980
Pittelli, D. (2016, January-February). Cambridge 02138 – Letters to the Editor. Harvard Magazine. Retrieved from http://harvardmagazine.com/2015/12/cambridge-02138
Sullins, B. (n.d.). Enterprise business intelligence with Tableau Server. Pluralsight. Retrieved from https://app.pluralsight.com/library/courses/enterprise-business-intelligencetableau-server/table-of-contents