Media Companies Must Become Trusted Data Hubs

This article was written by Mirko Lorenz, information architect (Cologne, Germany), Nicolas Kayser-Bril, head datajournalist at OWNI (Paris, France), and Geoff McGhee, online journalist who created Journalism in the Age of Data (Stanford). It was started by a tweet from @mirkolorenz which led up, 20 days later, to this piece which aims at inspiring media companies, showing new editorial and business opportunities.

Datajournalism and the Cloud

Journalists and media companies in general have had to answer a fundamental question ever since their traditional business model collapsed: What are we?

In the old days, it was easy. Media barons could see themselves as selling attention to advertisers, while journalists could see themselves as holding the powers that be to account, free of all business-related interference. And in those days, everybody was right.

Today, the two branches of the business have split, possibly forever. Advertising and journalism do not complement each other the way they used to. In that case, companies and people will have to specialize in one business or the other.

Those who prefer to run Google adwords next to user-generated product reviews have made their choice: they’ve left the journalism business. For others, the existential crisis continues. But there might be a way out – if media companies realize what data could mean for media business models.

The first step on that path is to understand that successful media companies of the future have to build an infrastructure that turns them into reliable data hubs, able to analyze even very large and complex datasets internally and to build stories on their insights.

Everything is data

All content produced by journalists is data. As Adrian Holovaty wrote six years ago, machine can assemble large portions of articles through structured data. Events can be described mostly with data, with free-form text remaining for the parts machines cannot understand (puns, jokes, irony, value judgements and the like).

IBM supercomputer Watson’s stunning victory in the quiz show Jeopardy! last week showed us that computers are getting ever closer to mastering the subtleties of human communication. Even today, some of Watson’s simpler cousins are toiling away, writing news articles without human input of any kind. The company StatSheets now provides uncannily readable sports-related articles to over 300 websites in the US, for example.

Sports are an easy vertical, due to the high amount of data a sports game entails. The challenge for news organizations in the coming years will be to adapt the framework used for sports, finance and weather to other fields.

Any event can be described by fundamental data: latitude, longitude, time and date and importance. If every piece of content had at least those four pieces of meta information, we could offer consumers a tailored package of news that happened near them in the time since their last connection, for instance. Nevertheless, the ‘importance’ filter mentioned above highlights that the whole process demands human subjectivity and cannot be left to computers alone.

Liquid media

Taking the concept further, articles could be described using semantic description formats, linking all parts of a story to the wider ‘web of data.’ Reporters could describe events in a structured manner, identifying entities and the links between them using well-established methods. It doesn’t obviate the need for investigation or fine-grained analysis, at the very least checking the reliability of the source data. Certainly, data can be cooked, forged and spun, just like any other form of communication. The playing field may change, but the fundamental role of journalism will remain the same: searching for truth, and demanding accountability of those in power.

News databases could be reused through APIs that let developers use the bits of information to provide ‘information-as-a-service’ interfaces. The crime pages of a newspaper could be transformed into a web app that plots all the events over time. Add to that the possibility of filtering the data by time, type of crime, nature and number of victims, among other things. Beyond the ad-sponsored model, media organizations could implement several tiers, on a freemium model. A visualization for the end user could be free. But tailored information crafted in specific ways to serve businesses, public safety officials, social service providers, realtors and home-buyers would provide immense value. Both for clients and for the media organizations themselves.

Journalists working actively with data can create future value by digging deeper. Using analytical systems yet to be imagined, they can report about past, current and even future trends that are relevant for their readers. Make no mistake: For journalists, data is a threat and an opportunity at the same time. The question is whether we will use this tool or leave it to others.

From data islands to data hubs

There is hope. What we are seeing at The Guardian, The New York Times and some others are the early signs of a transformation. Right now, there are only a few stories based on deep data analysis in stream of more traditional reporting. But some journalists already see that exploring data provides an important opportunity, and the practice is growing. From the use of standard deviation to show that movies are getting more polarizing to setting up pseudo-correlation analysis and bulk geocoding to spot the representatives claiming more than others or building indexes on the fly, data and stats are getting sexier by the day.

But often these pioneers have to act as renegades in their own organizations. They have to fight for resources, often moving their projects out to external platforms like Amazon cloud services in order to have the resources to work.

These days, the argument of media executives goes like this: we tried to make some money on the web, but revenue is still far lower than what we earned with the old model. As a result, the majority of online journalists are stuck working with outdated or unimaginative tools: content management systems that don’t play well with databases; the overall lack of a workflow around data; and two years after Jeff Jarvis declared that the news article is no longer the fundamental unit of journalism, newsrooms that are dedicated to churning out still more 700-word stories, not data visualizations, non-linear narratives, topic pages or the Wikipedia-style “living stories” that Google begged publishers to embrace during U.S. congressional hearings on the journalism crisis.

What is the trust market?

There is a reason why media companies are still referred to as “the press.” For a long, long time the printing machine was the core technology that provided a comfortable competitive edge. The ability to produce a million copies overnight and distribute them before breakfast offered a solid foundation for making money.

No more.

Enter the “trust market”. Trust, not information, is the scarce resource in today’s world. Trust is something that is hard to earn and easy to lose. And it is a core element of journalism, few other professions are so dependent on trust.

But it is not just a requirement, it is also an enormous underserved market. Media companies will learn that it is trust, not SEO, branding, or content farming that’s the road to success. And that road points right to data journalism.

There is even proof of concept, in media companies that anticipated the shift from text to data early on. Just read how Thomson has divested from newspapers and moved into business information. Bloomberg is a data success story, too – with the extra twist of understanding that value lies not only in the content, it lies in the way it is served, including the physical means of delivery. So, Bloomberg provided sleek terminals on top of its information feeds, “packaging” content on a human level: If you are sitting in a trading room in front of a two-monitor Bloomberg terminal, it’s visible to everyone that you have a “Porsche” in front of you (and probably another one down in the parking space).

Thomson and Bloomberg are billion-dollar companies. Apple is a multi-billion dollar company. But the trust market is even bigger. The maximum size of money moved by trust is the combined value of all advertising, PR and the millions of hours people spend searching for a reliable piece of information or good advice on what product they should get. In other words: It’s huge, and no one who is just in for a quick dollar can compete. If media companies find a winning combination of data and good stories to fulfill that need, they will be vaulted out of a dying market defined by technology (printing presses, radio stations or satellites) and into the “trust market.”

In a multiplaform world, “trust” is the defining attribute that moves goods and services. Most marketing and advertising can’t be trusted: The system behind it does not allow telling buyers if the newest camera from a company is actually any good. Advertising will always try to create a good impression for a product and service, but eventually people learn that it’s not the looks of a product that makes it valuable, but its day-to-day usability.

The trust market is still up for grabs. Most media players are still competing in the “attention market.” There, the thinking goes, you only need to grab someone’s attention to feed them money-earning ads. AOL’s recent Huffington Post acquisition is a sign of such backward thinking, as AOL is hoping to sell a massive audience to a bunch of hungry advertisers.

Such focus on numbers makes the media industry impervious to other metrics. Such as the user experience, for instance. Take Amazon. Ten years ago, the company was derided as a doomed experiment. But a consistent focus on service and user experience, starting from personalized shopping to a consistent focus on timely delivery and a trust in their customers made them market leaders in less than a decade. Where else do they resend your order when you simply say you haven’t received it?

What needs to be done: train first, harvest later

What we hear over and over in training sessions is that journalists are very interested in data, but worried that the necessary skills in programming, math, visualization and internet publishing are beyond their reach.

The image of a journalist versed in everything from video to text to investigation to computer science is a scary one, indeed. But it’s not the only way forward. Instead, tomorrow’s journalists will remain experts is what they’ve done since the 17th century and the advent of the Gazettes: Collect facts, vet them and write about them.

The difference is that fact collection will be organized rather than done by journalists. When a thousand users, brought together in a crowdsourcing operation, gather thousands of data points, isn’t it journalism? In the future, many journalists will resemble project managers, aggregating resources around platforms like Ushahidi rather than dashing adventurers embodied by fictional reporters from Tintin to Mikael Blomkvist.

Better tools and systems are part of the answer to making this transition work; while mature workflows exist for products as different as written articles and software development, the process of gathering and processing material for data driven reporting is largely an ad-hoc matter. Some combination of pagination and programming needs to be dreamed up to power data analysis and storytelling on an industrial scale. What is the life-cycle of structured, factual information gathered through individual and crowdsourced reporting? How are the various iterations of that information stored, accessed and refined?

Programmers’ first forays through the newsroom’s Human Resources firewall may well bring new ideas and best practices like version control, bug tracking and the distribution of tasks. Meanwhile, investigative and computer assisted reporters can emerge blinking into the light to help inform the journalistic processes and values of the new, mainstreamed task of data journalism. Graphics editors can creative direct the new interactive applications that readers will come to depend on in the same way that money managers got hooked on the Smart Money map of the market, perhaps the first data “killer app” in news.

In an era where more and more users have a camera phone and a way to put that content online, the journalist becomes the one who’s best able to curate and validate material from the data deluge, not just adding to it. Crowdsourcing should allow media organizations to devote more resources to vetting information produced by others, and thereby gaining trust.

On-scene reporting will continue, but on-screen fact-checking will become increasingly important. Many investigations will be led behind a computer as journalists organize a community of users and a team of developers to get stories out. Is it such a bad thing? We don’t know. Will it give a new lease of life to journalism? Definitely.

—

Image Credits: Marion Boucharlat, CC