The NOS and their visualizations

If you regularly read the news by means of a website or mobile app, you might have noticed that more and more news organisations have started to use fancy graphics to support their news stories. These data visualizations are used to deal with complex sets of data, and to make sense of the numbers that tell a story. Data visualization techniques provide alternative approaches to knowledge production as opposed to just reading a text or interpreting numbers to understand a story. (Reilly, 2014) However, this doesn’t mean that there are no risks at data visualization. It is also an easy way to mislead an audience. The American news organisation Fox News (2012) has some quite remarkable examples in which journalists use  several data visualization techniques to mislead their audiences. In this article I’ll take a look into some Dutch examples of misleading visualizations, and try to explain why it is more important to provide the correct information than to create a fancy graphic.

 

In our own country

After a search on the internet, I found out that the dutch news organisation NOS has an application with an archive of several data visualizations which they have used to support news articles that they have published. Although most of these visualizations seems to be correct, I still found some examples of misleading data visualization. Some of these visualizations are considered misleading, because it was relevant information was hidden, too much information was displayed in the graph and therefore unreadable, or information was presented by inappropriate ways. According to Cairo (2015) this are examples of three strategies on which most misleading visualizations are made. The next paragraphs contains five examples of visualizations from the NOS in which these strategies were used.

 

Social media use in 2012

The first visualization I found misleading was a graph of the social media use in 2012. This graph showed the use of social media and social network sites (such as Facebook and Twitter) categorized according to age groups.

 

Social media gebruik 2012

Now there are two things that (from my point of view) are misleading because of hidden relevant data. First, it is not clear what is exactly meant by the difference between “social media” and “social networks”. In fact it is not clear at all which social media are included in this graph, and whether there might be social media that were excluded. Furthermore, it is not clear if Facebook and Twitter were only categorized as social networks or as social media as well.

Secondly, it is not exactly clear what is meant by the numbers on the y-axis of the graph. Although it seems percentages of the total use, it also could have been total amount of hours spent on social media. The omission of such relevant information might be motivated by the assumption that the audience knows what is meant for each variable (Hullman & Diakopoulos, 2011). But because of the omission of this information, the visualizations rather become confusing than informative.

 

Political polls and purchasing power

Now having too much information to interpret is also not very desirable. For example, the NOS has published a poll with the distribution of seats for political parties in parliament, which contains a lot of information and therefore has become very confusing.

Peiling zetelverdeling

 

It is not very clear what is meant by the numbers in between the parentheses, and the graph that shows a development in several lines is too small to be able to make a distinction between them. The overload of information makes it difficult to easily interpret this data.

 

In another graph from the NOS on purchasing power, the visualization was clearly organized at first sight. There were only two different lines presented. However, the user has the possibility to add additional lines which made the graph still too crowded to be able to draw any conclusions from it.

 

Incident reports and cuts to the fire department.

The NOS has also published some visualizations in which the data was presented in an inappropriate way. At the beginning of 2015 the NOS made a graph about the amount of P2000-alerts (the number of times that emergency services were called). In this graph they compared the amount of alerts on new year’s night with the amount of alerts on other days.

P2000 meldingen

Now the first thing that is a bit doubtful, is the fact that the NOS only compared new year’s night with the christmas days in 2014. It may seem a bit logical that there are more calls to emergency services on new year’s eve when people setting off fireworks, than on christmas when most people are at home with their families having christmas dinner. Secondly, on the right of the screen an overview is presented with some emergency calls around 00:00 at night. However, these are calls from the first of december (from year “unknown”) instead of the first of january in 2015. Now this seems a bit like introducing a certain level of ‘noice’ into the visualization, a technique that is called Obscuring (Hullman and Diakopoulos, 2011). Because of the unrelevant extra information it is unclear why these messages are posted there, and just confusing for those who are trying to understand this graph.

Another misleading visualization by inappropriate presentation from the NOS was made on austerity in budget for the fire departments across the Netherlands. In this graph a map with all regions for the fire departments is presented, accompanied by the total of budget for each department in 2015.

Bezuinigingen brandweer

The NOS intended to inform their audience about the proposed austerity in budget for the fire departments until the year 2018. However, they distorted some information causing much confusion about the real budgets for the fire departments in 2018. The NOS visualized the budget by a grey line and a number, which represented the total budget for the year 2015. Directly below the grey line they created a red line and a number which represented an amount of euros. Now one could be misleaded because at first sight it looks like a major cut in budget will be made until 2018. However, the red line is not presented the total budget for 2018, it actually presents the total amount of the austerity which will be made up until 2018. By presenting this information in such a doubtful way, there is a risk people get wrong impressions of reality. In worst case people use the distorted reality in real life events, for example in social, economical or political issues.

 

Challenges in data visualization

With the emerge of big data and data journalism, visualization of data sets has proven to be very effective for presenting complex analysis (Keim, Qu & MA, 2013). But as demonstrated, it also allows journalists to manipulate a story and to mislead their audience. According to Cairo (2015) this is caused due to the fact that a lot of journalists and designers are not seriously trained in scientific methods, research techniques and data analysis. Because of this lack of certain knowledge, journalists and designers actually make mistakes that can be categorized as “lie” or “misleading information”. Now there might be journalists or news organisations that mislead on purpose, and the only way to unmask those is train ourselves in techniques for deceiving. However, for those who just don’t have the appropriate knowledge to avoid such mistakes, it is just a matter of getting trained in statistics and data analysis. Although it might seem more important to create a “fancy” good looking visualization, I believe it is more important that the correct information and thereby a newsworthy story is told with graphics.

 

References

app.nos.nl/datavisualisatie

Cairo, A. (2015). Graphics groin, misleading visuals: Reflections on the challenges and pitfalls of evidence-driven visual communication. In Bihanic D. (Ed.), New challenges for data design (pp. 103-116). Springer-Verlag, London.

Hullman, J., & Diakopoulos, N. (2011). Visualization rhetoric: Framing effects in narrative visualization. Visualization and Computer Graphics, IEEE Transactions on, 17(12), 2231-2240.

Keim, D., Qu, H., & Ma, K. L. (2013). Big-data visualization. Computer Graphics and Applications, IEEE, 33(4), 20-21.

Reilly, K. M. (2014). 12 Open Data, Knowledge Management, and Development: New Challenges to Cognitive Justice. Open Development: Networked Innovations in International Development, 297.

Simply Statistics. (2012). The statisticians at Fox news use classic and novel graphical techniques to lead with data.

How news organisations searching your Twitter for news

When I wake up in the morning, I like to start my day by watching the news. Usually my choice goes to RTL News, because they have a funny item called “media overview”. In this item RTL News takes a peek into the other media, and report about news stories which have appeared in newspapers and on the internet. In addition, RTL News often refers to stories that have found on social media sites.

mediaoverzicht

Journalists are increasingly using social media, and consider social media content more and more as reliable sources (Coosto, 2015). With the enormous amount of content we produce each day, journalists will increasingly rely on the use of Artificial Intelligence techniques to analyze this huge amount of data and create news stories out of all our social content (Lokot & Diakopoulos, 2015). In this article we’ll take a look into Artificial Intelligence (or smart data) in data journalism, and the challenges that appear when using smart data techniques to analyze social media content for news stories.

According to Wilde (2010) Artificial Intelligence consists of advanced data engineering technologies, which can perform data modeling and process metadata analysis. It is a technique that can assists users in all kinds of tasks, such as planning, problem solving, decision making, sensemaking, and even predicting. Some examples of such Artificial Intelligence techniques are data mining (an analytical process to explore huge amounts of data), machine learning, and natural language processing (Flaounas, Ali, Lansdall-Welfare, De Bie, Mosdell, Lewis & Cristianini, 2013). Some news organisations nowadays are already aware of these developments and are using Artificial Intelligence to analyze large amount of data, such as content on social media websites, and to create news stories out of it.  

 

Wait a minute, robots are taking over the news!?
This seems a bit of an extreme statement but let me give you an example from The New York Times. The data science team from New York Times developed a tool that is capable of predicting and suggesting by performing metrics on content from social media. This tool is called Blossom and is able to provide specific information on the content of a certain article or blogpost, to reveal where the content came from and calculate how the content is performing on social media websites. Based on these metrics, the bot can predict and suggest which content is interesting enough to use and to create news stories for The New York Times.  

Screenshot 2015-11-19 at 13.41.18

This phenomenon of ‘news bots’, robotic accounts that are able to do reporting, writing, and data analysis automatically, can be used in many different ways. Because of the undiminished growth of data on the internet and especially social media, journalists will increasingly rely and use information from the digital environment. Although these bots provide intriguing opportunities for journalists and news organizations, it also raises questions about the limits and the risks as well (Lokot & Diakopoulos, 2015).

 

Aha! There are limits and risks with news bots
One of the main issues of automated news and the use of news bots in journalism is a possible decrease in in transparency. Results of the study from Lokot and Diakopoulos (2015) showed that a lot of news bots (45%) are not transparent about their sources, the algorithms they use, or about the fact that they are actually a news bot. This raises questions about reliability and trustworthiness of ‘automated’ news articles, and how journalists should cope with these developments.

A second issue that occurs in the use of news bots is the problem with accountability on created news content. Although a creator is likely to be accountable for created content in case of legal violation, if news is fully created by news bots than who is to be held accountable for the content? It is likely to say the mentioned author or the news organisations, although that is perhaps from a jurisdicted point of view impossible (Lokot and Diakopoulos, 2015).

And last but not least, there is the issue of ethics to kept in mind, especially in the use of automated news bots. A news bots might be taken an increasingly important role in judgment of content, news selection, and consumption. However, it seems impossible to teach a news bot to act ethically. A certain algorithm that understands ethics would need to be able to understand how humans can see a concern, why humans think something probably matters, and when humans might act on a certain event (Lewis & Westlund, 2015).  

 

So are journalists going to disappear?
So if I review my statement ‘Are bots taking over the news?’ I guess that this will partially be true. I believe the use of Artificial Intelligence tools will continue to grow in data journalism. Journalists will need these tools to be able to cope with all the data that is being produced on social media on a daily basis. But with some of the issues that are mentioned above, it is clear that improvement is still needed. Human judgment will always be important because of ethical, transparency and accountability issues. Although the nature of the work of a journalist might change in the future, I do not think that journalists have to be afraid to be taken over completely by news bots.

robot-eic-rm-vrge_large_verge_medium_landscape

References

In data we generally can’t observe the things we want to measure

The phenomenon of big data and data journalism has grown rapidly. Free data and efficient tools for data analysis become more and more available. Besides (multinational) businesses and governments, news organisations are increasingly aware of all these data sets and the possibilities. According to Aitamurto, Sirkkunen and Lehtonen (2011) journalists search through large collections of data and use statistical methods, visualizations, and interactive tools in order to find and create news. But what is ‘news’?

According to the study of Harcup and O’Neill (2001) a news story requires one or more certain news values in order to make news out of a story. They mentioned ten different values that could lead to newsworthy items. In short, news containing influential or famous people, entertainment, elements of surprise or great relevance to society, are most likely to become news. In addition, there is always an agenda of the news organisation itself which may contain stories to satisfy a particular need or demand. For data journalism, news is mostly about numbers and the hidden stories that they might contain.

Although numbers don’t lie, with data analysis comes certain risks. Data can be manipulated, misinterpreted or even misused. If data is misinterpreted by journalists, the obtained news from the data might eventually mislead the reader and may draw a distorted image of reality. In this article we’ll take a look into the risks of using data sets to create news within numbers and what one should realize when using data to tell a story.

When women don’t get maried, they’re screwed.

This seems a bit radical, although it has been concluded by journalists. Paul Bradshaw, an online journalist with the Birmingham City University, argues that data journalism can start in two ways: ‘there is a question that needs data, or there is a dataset that needs questioning’. But even though the growth of big data from public services, business or governmental organisations, not all data is of journalistic value or contains a newsworthy story.  It is of great importance to make a thorough analysis of the available data, and determine whether the information contains news or might be of support to a story.

An article from The Washington Post last year, showed an example where open data is easily misinterpreted or misused. The Washington Post published an article about violence against women, and in particular which type of women are more likely to become a victim of assaults or abuse. According to the journalists from The Washington Post, data analysis showed that married women are safer than unmarried women, and girls raised by their own (married) father are less likely to be abused or assaulted than girls that are being raised without their own father. The claims from the Washington Post were based on a graph that was published in 2012 by the Department of Justice from the United States. According to Shannon Catalano, a statistician at the Bureau of Justice Statistics and the author of the study that was used in the article from the Washington Post, her data was presented without sufficient context.

WP graph

There were much more factors to be mentioned that were associated with violence against women. The Washington Post only used data from a single variable, household composition, and made their conclusions, telling their readers a story which was actually misinterpreted from the original data. One could say the available data was actually misused in order to create a news article. And this is where my statement ‘In data we generally can’t observe the things we want to measure’ comes in. It is very rare that specific questions can be answered directly through the observations from a set of data. When we are searching for newsworthy stories in data, we’d rather look for information that can be related to the question we want to answer. Therefore, this could mean that we have to search multiple sources and data, before we eventually find a newsworthy story.

Ok let’s combine multiple data sets than…

Another phenomenon for searching hidden stories in data, is to combine multiple data sets to create news stories. Combining multiple data sets, and visualize them together, has been done by Simon Rogers, former editor at The Guardian, in the past. In 2012 The Guardian combined data of homicide by firearms rates per 100.000 people and percentages of homicides by firearm, and conducted a map that showed the average firearms per 100 people on earth.

1. homocide by firearm rate per 100.000 2. percentages of homocides by firearm 3. average firearms per 100 people

Although mashup of different data sets is a powerful tool for data and news processing, there are some risks as well. In data journalism it is often easy to visualize and show what is going on in a specific event, but it is much harder to find out why something is happening. If we look at the homicide/firearms maps from The Guardian, we may be inclined to think that a larger amount of firearms is causing more homicides, but this is not necessarily true. In order to do such conclusions, one would eventually need the use of statistical methods to find relationships in the founded patterns in the data.

Tue relationship or merely a coincidental artifact?

This leads us to the hardest part in data journalism: finding a true relationship. When analyzing data sets, the main difficulty is to find patterns in the data that actually represent a true relationship. According to Harford (2014) finding causation in big data is much more difficult to do, and sometimes even impossible, than finding correlations in the data. It might even lead to multiple-comparison problems, where journalists look at as many possible patterns in data and calculate any correlation they can find. But the main problem is that if you do not know what is behind the found correlation(s), you have no idea what might have caused it in the first place. Just like the homicide/firearm example from The Guardian, according to the data it is easy to guess that an increased number of firearms, cause more homicides. But the thing is that if you guess what might have caused a certain correlation, there is a risk of getting a distorted view of the actual relationships that exist in data.

So when reviewing my statement ‘in data we generally can’t observe the things we want to measure’, I do not argue that we can’t observe anything in data sets. I’d like to point out that one needs to be careful when analyzing data, and has to take several pitfalls in mind before drawing any hard conclusions out of numbers. It is important to be critical at one’s own analysis, and to search for valid statistical evidence to ensure that true stories are to be told.  

References