Drugs, alcohol, and the NRC…

In this blog, we present the main findings of our fact checking report. The articles were chosen from news website NRC.nl. As a group, we chose a central theme for the articles to pick. As a result, all articles are somehow drug related. For each article, a short description of the story is followed by our conclusion. Fact checking wasn’t always easy, only some journalists responded to our e-mails and some articles gave more reasons to doubt them than others. Overall, this was an interesting project and it was good fun to chase after the facts. The NRC, as expected though, turns out to be a reliable source for information. Presenting more information than most other news media and being right 99% of the time with some small things left open for debate.


 

Strafrechtstraat berecht 39 ADE-gangers voor drugsbezit

Arrestanten hadden voor het eerst de mogelijkheid ter plekke te overleggen met een advocaat.

 Link to the article

This article was about arrests for drug possession during the Amsterdam Dance Event. On the 23th of October,  the NRC wrote a story about this event, that attracts thousands of people from all over the world. In 2014 there was a lot of buzz about drug use during ADE, because three people had died from the consequences of an overdose or misuse of drugs.

The NRC reported the following numbers:

  • 39 people were arrested and convicted for the possession of drugs
  • According to the NRC this was the first time during dance events that people had the possibility to consult a lawyer on the spot when they were arrested
  • Police halted 176 people during the event
  • 116 of these people were from abroad

At first sight these numbers didn’t seem questionable. However, the reason to doubt these numbers in the first place, was that it was a bit doubtful how these specific numbers were found and obtained. An official letter from the Minister of Justice and Safety to The House of Representatives was found on the official website of the Government. The Minister confirmed the numbers NRC.nl presented.

It seems that eventually most figures mentioned by NRC were actually correct. It is not clear how the NRC came up with the  numbers about foreign visitors being arrested. Possibly the numbers were confirmed from a source within the Public Prosecution Service or the Police, but did the NRC choose not to mention the name of the source or were they just not allowed to do so.


 

‘Toename crystal meth in Nederland’

Drugs Hulpverleners zien een toename van crystal-meth-gebruik. Vooral in de gayscene is de drug populair. De prijs daalt en dat is een gevaar.

Link to the article

 

The title indicates that there is a general increase of crystal meth use in the Netherlands. The journalist chose a misleading title, because the subtitle mentions a certain part demographic part of the Dutch population, the gay scene.  Mainly, the question was if the journalist used multiple sources to support the story about the increase of crystal meth use. The sample size of interviewees, being 27, was quite small as well, raising some doubts about the story in general.

After fact checking this article, the conclusion was that most of the facts and questions mentioned in the article are true, but the main title was generalizing the Dutch people and therefor, misleading.  Although it is true that the use of crystal meth is increasing in the Netherlands, this is only true for one subgroup.


 

Nederlanders smokkelden voor miljarden met ambulance

De mannen deden alsof ze patiënten vervoerden naar het ziekenhuis. Ze huurden neppatiënten in.

Link to the article

The next article  from the NRC website was about Dutch smugglers who used ambulances dozens of times to transport hard drugs like cocaine, heroin and XTC-pills to the United Kingdom. After cross checking with other media, some facts didn’t seem to add up, or discrepancy’s were found. Some of these facts had drawn the attention even more. The NRC mentioned that the police, amongst all other drugs, seized 20000 XTC-pills. When crosschecking with other media, there wasn’t any information about the 20000 XTC-pills to be found anywhere. When adding up the total amount of drugs that was seized, the XTC-pills seem to be left out of the equation.

Also, the numbers of vehicles in the fleet of ambulances varied between media. Consulting the article by the NCA (National Crime Agency of the U.K) and looking at different pictures of the vehicles, indicated that there were at least ten vehicles in the fleet.The journalist also wrote that that Olof S. and Richard E. had already been convicted for smuggling drugs. This conviction could not be found.

The journalist based her statements about this conviction on information provided by press agencies ANP, AP and Reuters. The bigger part of the story was based on the article by the NCA. She didn’t have time to do any more fact checking and on top of that, she did not fact check what the press agencies had sent her. The information by the press agencies, unfortunately is material only sent to media organizations like the NRC and cannot be checked by individuals.

Although this was where the fact checking journey ended and questions were left unanswered (for instance about the convictions), with the NCA, ANP, AP and Reuters behind the story, we must conclude that the information is based on solid evidence.


Jongeren kopen veel minder vaak alcohol

Toch drinkt meer dan de helft van de 16- en 17-jarigen nog steeds. Zij krijgen de drank van oudere vrienden.

Link to the article

The fourth article we investigated, was this  article is about alcohol buying and alcohol use by adolescents in the age of 16 or 17 year old.This article described how many people in this age group still manage to get alcohol and drink alcoholic beverages on a regular basis.  On January 1st, 2015, national law stated that people below 18 were not allowed to drink anymore. This article explained how the usage of alcohol by adolescents has changed.

After cross checking with other news media and  about adolescents and alcohol, two numbers in the NRC article raised our attention in particular. This was the case, because they differed from what the numbers research bureau Intraval had found.

  • More than 75 % of adolescents let older friends buy alcohol for them, opposed to 60 %  according to the research bureau
  • 40 percent gets alcohol from their parents, opposed to 51 % according to Intraval

Unfortunately, contacting the NRC journalist did not lead to a response. Ideally we would have got an answer about the differences we found. The differences in percentages, respectively 15% and 11% are too large to ignore.


Kwart miljoen Nederlanders gebruikt xtc

Link to the article

The last article that was subject to our fact checking, was an article about the use of XTC by Dutch citizens. According to the article,  a quarter million of Dutch people uses xtc. The article focuses on the drug use amongst the Dutch population, while the emphasis of the focus lies on the hard drug xtc.

The article consists of many numbers and statistics. However, there were some statements that raised some doubts. These doubts were raised by critical and skeptical reading,  rather than by dodgy suspicions. It already started with the title of the article, drawing the wrong image. By saying: ‘a quarter of a million Dutch people uses xtc’, it can be perceived as frequently and habitual. This seems misleading, because the 250.000 people are Dutch men and women between the ages of 15 and 65 who have used xtc in the past year. However, this also includes people who may have literally done it once for ‘experimenting’ reasons. The title leaves space for varying perceptions, which can be inconvenient. On the other hand, when the intention with the headline is to draw attention, it might be effective.

For this article, eight out of nine facts that were checked  turned out to be correct. There was one fact where the author referred to a quarter of the population, which turned out to be 24,3%. It might be debatable to put that percentage in those words, but understandable from a writer’s perspective. There was one fact that lead to a link with an error message. Due to this remarkable situation, that fact could not be checked via the link in the article. This link was supposed to lead to a research conducted by the Trimbos institute. Despite the mediocre communication with the author, the article turned out to be reliable and not exaggerating about the numbers.

The NOS and their visualizations

If you regularly read the news by means of a website or mobile app, you might have noticed that more and more news organisations have started to use fancy graphics to support their news stories. These data visualizations are used to deal with complex sets of data, and to make sense of the numbers that tell a story. Data visualization techniques provide alternative approaches to knowledge production as opposed to just reading a text or interpreting numbers to understand a story. (Reilly, 2014) However, this doesn’t mean that there are no risks at data visualization. It is also an easy way to mislead an audience. The American news organisation Fox News (2012) has some quite remarkable examples in which journalists use  several data visualization techniques to mislead their audiences. In this article I’ll take a look into some Dutch examples of misleading visualizations, and try to explain why it is more important to provide the correct information than to create a fancy graphic.

 

In our own country

After a search on the internet, I found out that the dutch news organisation NOS has an application with an archive of several data visualizations which they have used to support news articles that they have published. Although most of these visualizations seems to be correct, I still found some examples of misleading data visualization. Some of these visualizations are considered misleading, because it was relevant information was hidden, too much information was displayed in the graph and therefore unreadable, or information was presented by inappropriate ways. According to Cairo (2015) this are examples of three strategies on which most misleading visualizations are made. The next paragraphs contains five examples of visualizations from the NOS in which these strategies were used.

 

Social media use in 2012

The first visualization I found misleading was a graph of the social media use in 2012. This graph showed the use of social media and social network sites (such as Facebook and Twitter) categorized according to age groups.

 

Social media gebruik 2012

Now there are two things that (from my point of view) are misleading because of hidden relevant data. First, it is not clear what is exactly meant by the difference between “social media” and “social networks”. In fact it is not clear at all which social media are included in this graph, and whether there might be social media that were excluded. Furthermore, it is not clear if Facebook and Twitter were only categorized as social networks or as social media as well.

Secondly, it is not exactly clear what is meant by the numbers on the y-axis of the graph. Although it seems percentages of the total use, it also could have been total amount of hours spent on social media. The omission of such relevant information might be motivated by the assumption that the audience knows what is meant for each variable (Hullman & Diakopoulos, 2011). But because of the omission of this information, the visualizations rather become confusing than informative.

 

Political polls and purchasing power

Now having too much information to interpret is also not very desirable. For example, the NOS has published a poll with the distribution of seats for political parties in parliament, which contains a lot of information and therefore has become very confusing.

Peiling zetelverdeling

 

It is not very clear what is meant by the numbers in between the parentheses, and the graph that shows a development in several lines is too small to be able to make a distinction between them. The overload of information makes it difficult to easily interpret this data.

 

In another graph from the NOS on purchasing power, the visualization was clearly organized at first sight. There were only two different lines presented. However, the user has the possibility to add additional lines which made the graph still too crowded to be able to draw any conclusions from it.

 

Incident reports and cuts to the fire department.

The NOS has also published some visualizations in which the data was presented in an inappropriate way. At the beginning of 2015 the NOS made a graph about the amount of P2000-alerts (the number of times that emergency services were called). In this graph they compared the amount of alerts on new year’s night with the amount of alerts on other days.

P2000 meldingen

Now the first thing that is a bit doubtful, is the fact that the NOS only compared new year’s night with the christmas days in 2014. It may seem a bit logical that there are more calls to emergency services on new year’s eve when people setting off fireworks, than on christmas when most people are at home with their families having christmas dinner. Secondly, on the right of the screen an overview is presented with some emergency calls around 00:00 at night. However, these are calls from the first of december (from year “unknown”) instead of the first of january in 2015. Now this seems a bit like introducing a certain level of ‘noice’ into the visualization, a technique that is called Obscuring (Hullman and Diakopoulos, 2011). Because of the unrelevant extra information it is unclear why these messages are posted there, and just confusing for those who are trying to understand this graph.

Another misleading visualization by inappropriate presentation from the NOS was made on austerity in budget for the fire departments across the Netherlands. In this graph a map with all regions for the fire departments is presented, accompanied by the total of budget for each department in 2015.

Bezuinigingen brandweer

The NOS intended to inform their audience about the proposed austerity in budget for the fire departments until the year 2018. However, they distorted some information causing much confusion about the real budgets for the fire departments in 2018. The NOS visualized the budget by a grey line and a number, which represented the total budget for the year 2015. Directly below the grey line they created a red line and a number which represented an amount of euros. Now one could be misleaded because at first sight it looks like a major cut in budget will be made until 2018. However, the red line is not presented the total budget for 2018, it actually presents the total amount of the austerity which will be made up until 2018. By presenting this information in such a doubtful way, there is a risk people get wrong impressions of reality. In worst case people use the distorted reality in real life events, for example in social, economical or political issues.

 

Challenges in data visualization

With the emerge of big data and data journalism, visualization of data sets has proven to be very effective for presenting complex analysis (Keim, Qu & MA, 2013). But as demonstrated, it also allows journalists to manipulate a story and to mislead their audience. According to Cairo (2015) this is caused due to the fact that a lot of journalists and designers are not seriously trained in scientific methods, research techniques and data analysis. Because of this lack of certain knowledge, journalists and designers actually make mistakes that can be categorized as “lie” or “misleading information”. Now there might be journalists or news organisations that mislead on purpose, and the only way to unmask those is train ourselves in techniques for deceiving. However, for those who just don’t have the appropriate knowledge to avoid such mistakes, it is just a matter of getting trained in statistics and data analysis. Although it might seem more important to create a “fancy” good looking visualization, I believe it is more important that the correct information and thereby a newsworthy story is told with graphics.

 

References

app.nos.nl/datavisualisatie

Cairo, A. (2015). Graphics groin, misleading visuals: Reflections on the challenges and pitfalls of evidence-driven visual communication. In Bihanic D. (Ed.), New challenges for data design (pp. 103-116). Springer-Verlag, London.

Hullman, J., & Diakopoulos, N. (2011). Visualization rhetoric: Framing effects in narrative visualization. Visualization and Computer Graphics, IEEE Transactions on, 17(12), 2231-2240.

Keim, D., Qu, H., & Ma, K. L. (2013). Big-data visualization. Computer Graphics and Applications, IEEE, 33(4), 20-21.

Reilly, K. M. (2014). 12 Open Data, Knowledge Management, and Development: New Challenges to Cognitive Justice. Open Development: Networked Innovations in International Development, 297.

Simply Statistics. (2012). The statisticians at Fox news use classic and novel graphical techniques to lead with data.

Now who tells us what is important?

I can say that I am a really big fan of the news. Whenever I have some time, for instance when I’m traveling by public transport, I grab my phone and start to read Nu.nl, NOS or some other news applications. Now if you frequently read the news, you can easily tell someone what is currently going on in the world and what the main issues are at this time. This idea that there is a correlation between what the media focus on and what audiences see as important is called agenda-setting (Scheufele & Tewksbury, 2007). This means that because mass media pay much attention to certain issues, their audiences will eventually regard these issues as important. Although this might seems logical, It might not certain whether we can still speak of agenda-setting by traditional media nowadays. This theory does not seemed to take into account any technological developments from the last years, and therefore it might be somewhat outdated. According to Marketingfacts (2013) the internet has put the agenda-setting theory into a completely different perspective, because the internet made the news much more interactive. People do not only consume the news anymore because they are part of it now. So do news organisations still have so much influence on our view of importance of issues? Or does technological progress forces to review the agenda-setting phenomenon in this digital age?

 

What we have read and seen in the last three months
An example of an issue that has received much attention in recent months is the refugee flow from Syria to Europe. If you would ask someone to name one of the most urgent topics at the moment, there is a great chance that this might be one of the issues to be mentioned. One might think this is because this ‘someone’ would have probably read all about it in the news. However, there may be a different factor in which this problem came to attention.

This other explanation might be found in a social media analysis from Coosto (a social media monitoring & webcare tool). Analyzing both social media and news sources provides some insights into the relations of messages from the two types of media. A search query for “refugees” OR “refugee issue” OR “refugee flow” OR “emergency shelter” in a time span from 1 september till 30 november this year, shows that there have been written over 400.000 messages concerning the refugees issues.

Bronnensep - nov 1

The first thing to notice is the difference in amount of social media messages, compared to articles that have occurred in the news. The amount of social media messages is almost 5 times the amount of articles that were published by news organisations. This is probably a logical consequence of the number of authors per type of media. social media content is written and published by millions of people, against a relatively small number of professional journalists. However, therefore there might be a chance that one has read more of the refugee problem through social media than through the news, and that its association of the issue did not came from traditional media but rather from social media.

In addition to the analysis of coverage on the issue per month, another analysis was made on the coverage per hour. In this analysis used the same search query as before was filtered on a period of 7 days in the month of september, and Coosto was used to show the amount of messages per hour on a day.

sep week 1sep week 2sep week 3sep week 4

The analysis above shows the amount of messages per hour on a specific day in September. As can be seen from the analysis above, it seems that the amount of social media messages (blue) is increasing earlier in the day than the amount news articles (orange). Although it is not possible to draw hard conclusions yet, this might indicate that the reporting in the news actually follows the coverage on social media content. If this were the case, it would mean that we are responsible ourselves for any agenda-setting, because of the social media content that we have probably posted ourselves.

 

So who is setting an agenda now?
In contrast with the past the news is not only written by professional journalists anymore. Bloggers, citizen journalists and Twitter or Facebook users are increasingly contribute to the production of the news (Matei & McDonald, 2015). The content created on social media is increasingly used by Journalists, who consider social media more and more often as a reliable source. With social media data journalists can easily monitor audience sentiments, trending topics, and publish created news stories to a large (global) audience (Conway, Kenski & Wang, 2015). Therefore it is not so obvious anymore that the traditional media determine, by means of agenda-setting, what issues are regarded as important. It is possible that social media has at least as much influence on our view of the world, as traditional media have.  

Due to the growth of digital media and online audiences, the phenomenon of agenda-setting issues are becoming more complex. It is difficult to determine which type of media has the greatest influence on the public view on specific issues. This is partly because both traditional media and social media are online and nearly equally accessible (Neuman, Guggenheim, Mo Jang & Young Bea, 2014). However, I do think that it is no longer obvious to say that the agenda-setting theory only applies to traditional media. As mentioned before, the interactive aspects of the internet have caused some changes in the relationships between news organisations and their audiences. One can help to create the news, help to spread the news and be very critical to the news. All these aspects influence our view to the world and the news that is presented to us. So before we say ‘it’s all because of what the media tell us’ we might want to consider for what agendas we are responsible ourselves.  

 

References

Why the truth about bacon can be very refreshing

Figures, percentages and calculations, the media loves to use such data that contain (simple) messages that can be used to create newsworthy stories. News organisations and their audiences are very interested in statistics, especially when they seem to be related to our daily lives (Wormer, 2007). But with the use of statistical data for news comes a risk as well, and that is telling the wrong story. Of course you can discuss the severity of wrong interpretations in statistical data, but the fact remains that one may be misinformed about serious topics. This phenomenon has recently appeared in various news media.

 

Why bacon seemed to be dangerous
In october this year, the World Health Organisation (2015) announced that processed meat, such as sausages or bacon, can cause cancer. The International Agency for Research on Cancer (IARC) has surveyed over 800 studies on the issue and listed processed meat in the same category as smoking, alcohol, uranium, and exposure to solar radiation (Stats, 2015). The IARC also found that red meat was probable carcinogenic to humans. After these announcements from the WHO, many news organisations started to report about the findings of the study from the IARC: The NOS, the NBC, the AD, the FD, the NRC, the Telegraaf and many more reported about the carcinogenic risks of processed meat. Although this story went “viral” in the news, it appears that not many media sources understood the real implications of the study.

Screenshot 2015-11-27 at 10.32.45

The increase of absolute risk or the risk we already had… Who cares!?
In some of the articles that were published in the news, the media reported about the increased risks of getting cancer by eating processed or red meat. For instance the NOS, Nu.nl, and the Telegraaf reported that people who eat 50 grams or more on a daily basis, would have an increased risk of 18% to get colorectal cancer compared to people who eat less meat.

Now this reasoning was not exactly correct, because this was not what the IARC meant with their publications. When the IARC argued that ‘each 50 gram portion of processed meat eaten daily increases the risk of colorectal cancer by 18%’, they actually meant an increase in risk that we already had (which is less than 18%). To illustrate this, if for instance the risk to get cancer is 10%, an increase of 18% in a 10% risk results in an increase of 1.8%. This means our absolute risk would increase from 10% to 11.8% (Stats, 2015), which is less than the some news organisations reported. The problem is that these findings are less dramatic than the 18% which was eventually mentioned. But there were new organizations that went even further.

Screenshot 2015-11-27 at 10.37.32

Some media, as for instance the Metro and the Daily Mail, reported that the consumption of processed meat was just as carcinogenic as smoking cigarettes. But an infographic from Cancer Research UK (2015) shows that this is absolutely not the case. Although significant results were found that both processed meat and smoking cigarettes causes cancer, does not mean the risk and damage are the same. There is a far greater risk of getting cancer through smoking cigarettes, and the number of cases could be much more reduced by quit smoking than by stop eating meat.

Screenshot 2015-11-26 at 12.54.08

So what does this mean for (data) journalism?
Now this is may seem just one example of a recent misinterpreted breaking news story, but this happens more often than we think. Not only in the news, but also in science there is an attitude of the most dramatic findings are also the most likely to get published (Lehrer, 2010). In science, opportunistic biases or invalid findings has negatively affected the way that the public regards research and scientists. People tend to doubt about the validity of research findings because researchers are suspected to be motivated by political, economic or social agendas (DeCoster, Sparks, Sparks, Sparks & Sparks, 2015). I believe the same thing can happen for journalists and news organisations. However, with the emerge of big data and data journalism, more and more people have accessibility to data sets, and therefore more and more people can act as a journalist. This leads to greater importance of statistics and statistical reasoning in journalism, because the availability of data will give people the opportunity to check everything they read in the news (Nguyen & Lugo-Ocando, 2015). There is a risk that people will increasingly mistrust journalism and the media, if it turns out that the stories they publish are not always based on truth. And with the continued growth of data and analytical tools, we may be close at a point in the future we can reveal how often news organisations tell us the truth or lie about breaking stories.

 

References

How news organisations searching your Twitter for news

When I wake up in the morning, I like to start my day by watching the news. Usually my choice goes to RTL News, because they have a funny item called “media overview”. In this item RTL News takes a peek into the other media, and report about news stories which have appeared in newspapers and on the internet. In addition, RTL News often refers to stories that have found on social media sites.

mediaoverzicht

Journalists are increasingly using social media, and consider social media content more and more as reliable sources (Coosto, 2015). With the enormous amount of content we produce each day, journalists will increasingly rely on the use of Artificial Intelligence techniques to analyze this huge amount of data and create news stories out of all our social content (Lokot & Diakopoulos, 2015). In this article we’ll take a look into Artificial Intelligence (or smart data) in data journalism, and the challenges that appear when using smart data techniques to analyze social media content for news stories.

According to Wilde (2010) Artificial Intelligence consists of advanced data engineering technologies, which can perform data modeling and process metadata analysis. It is a technique that can assists users in all kinds of tasks, such as planning, problem solving, decision making, sensemaking, and even predicting. Some examples of such Artificial Intelligence techniques are data mining (an analytical process to explore huge amounts of data), machine learning, and natural language processing (Flaounas, Ali, Lansdall-Welfare, De Bie, Mosdell, Lewis & Cristianini, 2013). Some news organisations nowadays are already aware of these developments and are using Artificial Intelligence to analyze large amount of data, such as content on social media websites, and to create news stories out of it.  

 

Wait a minute, robots are taking over the news!?
This seems a bit of an extreme statement but let me give you an example from The New York Times. The data science team from New York Times developed a tool that is capable of predicting and suggesting by performing metrics on content from social media. This tool is called Blossom and is able to provide specific information on the content of a certain article or blogpost, to reveal where the content came from and calculate how the content is performing on social media websites. Based on these metrics, the bot can predict and suggest which content is interesting enough to use and to create news stories for The New York Times.  

Screenshot 2015-11-19 at 13.41.18

This phenomenon of ‘news bots’, robotic accounts that are able to do reporting, writing, and data analysis automatically, can be used in many different ways. Because of the undiminished growth of data on the internet and especially social media, journalists will increasingly rely and use information from the digital environment. Although these bots provide intriguing opportunities for journalists and news organizations, it also raises questions about the limits and the risks as well (Lokot & Diakopoulos, 2015).

 

Aha! There are limits and risks with news bots
One of the main issues of automated news and the use of news bots in journalism is a possible decrease in in transparency. Results of the study from Lokot and Diakopoulos (2015) showed that a lot of news bots (45%) are not transparent about their sources, the algorithms they use, or about the fact that they are actually a news bot. This raises questions about reliability and trustworthiness of ‘automated’ news articles, and how journalists should cope with these developments.

A second issue that occurs in the use of news bots is the problem with accountability on created news content. Although a creator is likely to be accountable for created content in case of legal violation, if news is fully created by news bots than who is to be held accountable for the content? It is likely to say the mentioned author or the news organisations, although that is perhaps from a jurisdicted point of view impossible (Lokot and Diakopoulos, 2015).

And last but not least, there is the issue of ethics to kept in mind, especially in the use of automated news bots. A news bots might be taken an increasingly important role in judgment of content, news selection, and consumption. However, it seems impossible to teach a news bot to act ethically. A certain algorithm that understands ethics would need to be able to understand how humans can see a concern, why humans think something probably matters, and when humans might act on a certain event (Lewis & Westlund, 2015).  

 

So are journalists going to disappear?
So if I review my statement ‘Are bots taking over the news?’ I guess that this will partially be true. I believe the use of Artificial Intelligence tools will continue to grow in data journalism. Journalists will need these tools to be able to cope with all the data that is being produced on social media on a daily basis. But with some of the issues that are mentioned above, it is clear that improvement is still needed. Human judgment will always be important because of ethical, transparency and accountability issues. Although the nature of the work of a journalist might change in the future, I do not think that journalists have to be afraid to be taken over completely by news bots.

robot-eic-rm-vrge_large_verge_medium_landscape

References

In data we generally can’t observe the things we want to measure

The phenomenon of big data and data journalism has grown rapidly. Free data and efficient tools for data analysis become more and more available. Besides (multinational) businesses and governments, news organisations are increasingly aware of all these data sets and the possibilities. According to Aitamurto, Sirkkunen and Lehtonen (2011) journalists search through large collections of data and use statistical methods, visualizations, and interactive tools in order to find and create news. But what is ‘news’?

According to the study of Harcup and O’Neill (2001) a news story requires one or more certain news values in order to make news out of a story. They mentioned ten different values that could lead to newsworthy items. In short, news containing influential or famous people, entertainment, elements of surprise or great relevance to society, are most likely to become news. In addition, there is always an agenda of the news organisation itself which may contain stories to satisfy a particular need or demand. For data journalism, news is mostly about numbers and the hidden stories that they might contain.

Although numbers don’t lie, with data analysis comes certain risks. Data can be manipulated, misinterpreted or even misused. If data is misinterpreted by journalists, the obtained news from the data might eventually mislead the reader and may draw a distorted image of reality. In this article we’ll take a look into the risks of using data sets to create news within numbers and what one should realize when using data to tell a story.

When women don’t get maried, they’re screwed.

This seems a bit radical, although it has been concluded by journalists. Paul Bradshaw, an online journalist with the Birmingham City University, argues that data journalism can start in two ways: ‘there is a question that needs data, or there is a dataset that needs questioning’. But even though the growth of big data from public services, business or governmental organisations, not all data is of journalistic value or contains a newsworthy story.  It is of great importance to make a thorough analysis of the available data, and determine whether the information contains news or might be of support to a story.

An article from The Washington Post last year, showed an example where open data is easily misinterpreted or misused. The Washington Post published an article about violence against women, and in particular which type of women are more likely to become a victim of assaults or abuse. According to the journalists from The Washington Post, data analysis showed that married women are safer than unmarried women, and girls raised by their own (married) father are less likely to be abused or assaulted than girls that are being raised without their own father. The claims from the Washington Post were based on a graph that was published in 2012 by the Department of Justice from the United States. According to Shannon Catalano, a statistician at the Bureau of Justice Statistics and the author of the study that was used in the article from the Washington Post, her data was presented without sufficient context.

WP graph

There were much more factors to be mentioned that were associated with violence against women. The Washington Post only used data from a single variable, household composition, and made their conclusions, telling their readers a story which was actually misinterpreted from the original data. One could say the available data was actually misused in order to create a news article. And this is where my statement ‘In data we generally can’t observe the things we want to measure’ comes in. It is very rare that specific questions can be answered directly through the observations from a set of data. When we are searching for newsworthy stories in data, we’d rather look for information that can be related to the question we want to answer. Therefore, this could mean that we have to search multiple sources and data, before we eventually find a newsworthy story.

Ok let’s combine multiple data sets than…

Another phenomenon for searching hidden stories in data, is to combine multiple data sets to create news stories. Combining multiple data sets, and visualize them together, has been done by Simon Rogers, former editor at The Guardian, in the past. In 2012 The Guardian combined data of homicide by firearms rates per 100.000 people and percentages of homicides by firearm, and conducted a map that showed the average firearms per 100 people on earth.

1. homocide by firearm rate per 100.000 2. percentages of homocides by firearm 3. average firearms per 100 people

Although mashup of different data sets is a powerful tool for data and news processing, there are some risks as well. In data journalism it is often easy to visualize and show what is going on in a specific event, but it is much harder to find out why something is happening. If we look at the homicide/firearms maps from The Guardian, we may be inclined to think that a larger amount of firearms is causing more homicides, but this is not necessarily true. In order to do such conclusions, one would eventually need the use of statistical methods to find relationships in the founded patterns in the data.

Tue relationship or merely a coincidental artifact?

This leads us to the hardest part in data journalism: finding a true relationship. When analyzing data sets, the main difficulty is to find patterns in the data that actually represent a true relationship. According to Harford (2014) finding causation in big data is much more difficult to do, and sometimes even impossible, than finding correlations in the data. It might even lead to multiple-comparison problems, where journalists look at as many possible patterns in data and calculate any correlation they can find. But the main problem is that if you do not know what is behind the found correlation(s), you have no idea what might have caused it in the first place. Just like the homicide/firearm example from The Guardian, according to the data it is easy to guess that an increased number of firearms, cause more homicides. But the thing is that if you guess what might have caused a certain correlation, there is a risk of getting a distorted view of the actual relationships that exist in data.

So when reviewing my statement ‘in data we generally can’t observe the things we want to measure’, I do not argue that we can’t observe anything in data sets. I’d like to point out that one needs to be careful when analyzing data, and has to take several pitfalls in mind before drawing any hard conclusions out of numbers. It is important to be critical at one’s own analysis, and to search for valid statistical evidence to ensure that true stories are to be told.  

References