Thursday, 5 June 2014

More Big Insights on Big Data


Given the data, what are her chances of
getting breast cancer?
Unable to sate its big appetite for big data insights, the Population Health Blog glommed onto the New England Journal's just-published article on "Learning from Big Data."

As noted previously, "big data" is the use of statistical associations ("predictors") in a) large and b) disparate data sets  to gain insights at the individual level ("outcomes"). For example, a physician could know the likelihood - based on demographic, clinical and economic inputs - that a particular patient won't fill a prescription. As an other example, the PHB spouse could know the likelihood - based on prior active-passive behaviors, incentives and maternal upbringing - the likelihood, despite numerous reminders, that her husband will "forget" to take out the trash.

It's important to recall that big data is not about causality. Just because living in a certain zip code is an independent predictor of obesity (for example) doesn't mean living in [insert name of town] causes residents to be fat. Big data is "agnostic" about the cause, but that doesn't mean Big Data Architects (BDAs) can't use the information.

According to the author, the road from the promise to the reality of big data will be lined with:

1. generalizability, or being confident that the populations used in big data studies are similar to the populations where their lessons are being applied. Propensity matching or scoring is a good step in that direction;

2. automation, so that multiple questions can be answered simultaneously by many users;

3. "data refreshes," so that associations can be retested on repeated basis as new data come on line;

4. ease-of-use, so that even an orthopedist could use the software and understand the outputs.*

Politically, we'll also need to get

5. the owners of data warehouses - including the electronic health record vendors and insurers - to agree on either a) common data formats or b) methods that allow for the interpretation of data regardless of the format. An example of the latter the use of an order, entry or insurance claim for supplemental oxygen therapy as a marker of poor health status.

6) a resolution of our absolutist privacy "impasse.""De-identification" of patients' information makes it possible, but never guaranteed, to keep personal health information secure.

*okay, the New England Journal author didn't poke fun at the orthopedists by saying that, but the PHB couldn't resist. By the way, one way to do this would be to have the outputs be in pictures.

Image from Wikipedia

No comments:

Post a Comment