This map was created by the Toronto Community Health Profiles partnership with the help of data from Canada’s long-form census.
Okay everyone, this morning we’re going to talk about statistical survey methodology. Get excited people! It’s not everyday that an obscure piece of technical arcana finds its way into ordinary conversation.
It happened in America with “dangling chads” in 2000, and again with “subprime mortgage lending” in 2007—both terms with which most of us were unfamiliar, but on which major events ended up hinging.
Here in Canada, we got interested in previously obscure procedure last year, when the conservative government announced its landmark decision to scrap the mandatory long-form census and replace it with an optional National Household Survey (NHS). All of a sudden, serious technical concerns about “data validity” and “response bias” were thrust into the national media spotlight.
This June, as questionnaires start shipping out, the issue is again rearing its jargon-filled head.
But why is this such a big deal? Why are so many statisticians and policy-makers wringing their hands in nervous anticipation? Why did former Stats Canada head Munir Sheikh go as far as to resign from his post when he heard the news?
Torontoist caught up with former Stats Can statistician James Hiu to get a handle on the challenges posed by the optional NHS.
The major problem with an optional survey, Hiu says, is that it may over- or under-represent certain segments of the population. When you give a group of people an optional questionnaire, there is always a chance that those who don’t respond will differ in meaningful ways from those who do. In the case of the NHS, researchers argue that ethnic minorities, and individuals with very low or very high incomes, will be least likely to respond.
“If the possibility of fines or jail time is available and you don’t understand the language, you may talk to a friend to help you fill out the survey,” says Hiu. “Now that it’s optional, these small inconveniences or excuses make it possible for people to just relax and not fill it out.”
“Non-response bias,” as statisticians call this phenomenon, is not a contentious political issue, it’s a mathematical fact. Even opponents of the long-form census, like the conservative-minded Fraser Institute, agree that its successor will introduce more response bias. Niels Veldhuis, VP of Canadian Policy Research at the Fraser Institute, readily admits “there are going to be underrepresented groups if you move to voluntary surveying.” The question is, can we correct for this bias in any way?
One possible solution is to use data from previous censuses to fill gaps in NHS results. The so-called “imputation” of missing results through the aid of external data is a standard statistical technique. But it runs into problems if the data you are using to plug holes differ in meaningful ways from your obtained results.
As Hiu notes, much has changed since the last census in 2006. “We are at a unique point in history…the recession has not been this bad in recent memory.” Hiu worries that adjusting for bias in this manner will gloss over critical social trends that have only emerged in the past few years.
If census takers can’t shore up their weaknesses with the help of old surveys, some have suggested supplementing optional questionnaires with records maintained elsewhere. As British Cabinet minister Francis Maude tells the BBC in reference to England’s own data collection strategy: “There is a load of data out there in loads of different places.”
If the voluntary NHS is insufficient, Veldhuis believes that organizations can “rely on other surveys…the same way businesses plan where they’re going to open locations.”
“Better data is there,” Hiu admits, “but it’s much more fragmented…At the end of the day someone is still responsible to go out to all these data marks and collect them.”
To obtain an important public health statistic, like the ethnic makeup of neighborhoods in the GTA, one could conceivably go to all the hospitals in Toronto and request permission to sift through anonymized records. The problem, however, is that different hospitals might collect data in different ways. Since you don’t have control over the surveying, there’s a risk that important questions may have been phrased in a biased way, or simply omitted altogether.
Image courtesy of the Toronto Community Health Profiles partnership.
“People select data for different purposes,” Hiu stresses, “and the devil’s in the details.”
Such technical issues are of great concern to Toronto city planner Tom Ostler and health policy professional Paul Fleiszer, both of whom use the long-form frequently in their work.
Fleiszer, who works for Toronto Public Health, says that his department “uses data on language, immigration, ethnicity, income, and education, all previously available from the long-form, to guide our programs and policies.”
“For example, we offer tuberculosis prevention initiatives to people that have immigrated from countries where tuberculosis is endemic. The long-form identified areas where those populations live so we knew which neighborhoods to offer classes in.”
“One critical [item] that we use in city planning in particular,” Tom Ostler says, “is the question of where people work and linking that question to where they live. [This gives us] a picture of commuting flows across the city,” which can help in planning bus routes and transit initiatives.
“Even just a basic statistic like the number of people who are working inside the city of Toronto,” Ostler explains, helps the City set job targets for the future. These targets influence how much money will be invested in employment services and infrastructure.
“At the end of the day,” says Fleiszer, “if you don’t have good data, you can’t make good decisions. That irritates me as a public health professional.”
Veldhuis’ response to these concerns is more systemic: “If you look at most cities, we know that about 95 per cent of people commute to and from work in vehicles, and what city planners want to do is take people out of vehicles and put them into rapid transit. So I’m not entirely sure that what they’re getting from the census is information they’re actually using in the correct manner.”
The Fraser Institute does acknowledge that there are many critical government institutions that rely on accurate data, but for Veldhuis, the overarching concern is an ethical one. “If you look at the major funding organizations of university research,” he says, “they would never fund research by any academic in this country if there were forced participation in their survey. Somehow all these academics think it’s different when it comes to the census, and I just don’t see how it’s any different.”
Hiu explains that “other surveys piggyback on the census data.” If, like Ostler, you want to know why Torontonians choose to move downtown, you can go out and conduct voluntary surveys. But it helps to have census data on hand so you can tell if the response rates you’re getting are representative of the actual population. With the death of the long-form, Hiu argues, researchers are losing an important tool to benchmark their results.
Hot-button political issues aside, everyone we talked to agrees that it’s impossible to tell how biased the NHS will be until we start seeing results, which will start to roll in come 2012.
“If [Stats Canada] properly targets these underrepresented groups,” Veldhuis states hopefully, “I think we’ll have a fairly accurate assessment of the key data points that they’re trying to get at.”
Assuming a target response rate of 50 per cent, Stats Canada recently used their arcane statistical powers to estimate the response bias they expect to be associated with individual census questions in specific metropolitan areas. The estimation, available for free on their website, is perhaps the best indication thus far of how the NHS will differ from its predecessor.
It’s worth a look, if your eyes haven’t entirely glazed over by this point. You’ll see that in Toronto, questions about immigration status and college/CEGEP completion are expected to generate relatively unbiased responses, while the bias of questions about ethnicity may range as high as 17 per cent.
Prefacing their data, government number crunchers leave us with these candid words to chew on: “We have never previously conducted a survey on the scale of the voluntary National Household Survey, nor are we aware of any other country that has…We are confident that the National Household Survey will produce usable and useful data that will meet the needs of many users. It will not, however, provide a level of quality that would have been achieved through a mandatory long-form census.”