April 27, 2018, by Brigitte Nerlich

Data harvesting: A metaphor ripe for scrutiny

At the end of March news of a data scandal broke – you all know which one. As Steven Poole in The Guardian wrote: “The political data firm Cambridge Analytica has been accused of unauthorised ‘data harvesting’ from millions of Facebook accounts. This handily avoids allegations of ‘theft’ or even just ‘mining’”. Data harvesting is not only a metaphor but also a euphemism. It highlights and hides, as Poole makes clear.

Metaphor and euphemism

He asks: “but why the agricultural metaphor?” and provides the following answer: “The harvest is the collection of ripe crops in the autumn, which has its own church festival: we gratefully collect what the all-powerful has put there for us. This sense of gathering up what is natural persists in the talk of ‘harvesting’ cells in biological experiments (from 1946), but has become irreparably perverted in the euphemistic use of ‘harvesting’ to mean hunting whales. ‘Data harvesting’ itself emerged from scientific information management in the late 90s, and soon became a buzzphrase for online marketers. In other words, data harvesting is about as old as the modern web, and might even be viewed as its entire purpose.”

So, what is data harvesting? A good answer can be found on Quora: “The term data harvesting, or web scraping, has always been a concern for website operators and data publishers. Data harvesting is a process where a small script, also known as a malicious bot, is used to automatically extract large amount of data from websites and use it for other purposes. As a cheap and easy way to collect online data, the technique is often used without permission to steal website information such as text, photos, email addresses, and contact lists.” Not all data harvesting is theft, but a large proportion is.

Proposal for a project

Poole’s article provoked Paul Reilly to ask Dimitrinka Atanasova on twitter: “possible link to your work on metaphor”? and: “Agricultural metaphors: the new moral panic? Is a paper waiting to be written ;)”. I thought this was a really good idea, and I hope Dimi and others will pick it up one day.

To continue that conversation and prepare the ground for a potential paper with/by Dimi and/or others, I wanted to collect a bit more information about the phrases ‘data harvests’ and ‘data harvesting’, especially since I have been interested in metaphors for big data for a while (as have others, such as Deborah Lupton). However, ‘data harvesting’ as a metaphor has so far only been studied at least briefly by Poole! To prepare the ground for future work I am trying to establish a bit of a baseline for when and how  and by whom this metaphor has been used. To do that I first turned to the Oxford English Dictionary, then to the news database Nexis.


I was a bit disappointed when I found that the OED has not yet recorded the phrases ‘data harvest’ and ‘data harvesting’. It only defines ‘data mining’, ‘data capture’, ‘data bank’ and ‘data stream’. Nothing even on data collection, extraction, analysis, analytics etc. The situation is similar for Merriam Webster, which has however some very recent examples for ‘data mining’ which are interesting. This dictionary says that ‘data mining’ was first used in 1968. The OED has an example for 1962. So that seems to precede ‘data harvesting’.

News patterns

Searching for ‘data harvest’ on Nexis was confusing, as it is the well-established name of an educational software firm, established in 1987. I therefore decided to focus on ‘data harvesting’ alone, leaving the harvest of ‘data harvest’ to future researchers.

On 21 April 2018 I searched All English Language News with that phrase as a search term and got 2878 hits overall on a high similarity setting, which is not as much as I anticipated. However, I suspected that there has been a steep increase over the last month or so. Actually, as it turns out, by 21 April this year there were 1672 hits. So, it’s the year of ‘data harvesting’. The only slight bumps before that were in the years 2000 and 2014 – but this year so far dwarves all that.

When counting the yearly hits I didn’t see many meaningful headlines as most of the articles seem to stem from marketing, but some stood out, one from 1995: “Your identity becomes a commodity”, another from 2013: “United Kingdom dishing the data” and another from 2015: “We are citizens, not mere physical masses of data for harvesting”. Data harvesting as a metaphor has consequences for how we see ourselves and others as citizens.

The 1990s

It seems that the metaphor ‘data harvesting’ made its first appearance on Nexis in 1993, corroborating Steven Poole’s assessment in The Guardian. Here is the quote from PR Newswire from 5 April 1993 in the context of an announcement by Apple and a new ‘server family’ at a major computer trade show in Hanover, Germany: “‘Our software partners’ announcements clearly indicates the direction we plan to take with our server products for the enterprise,’ said Morris Taradalsky, vice president and general manager of Apple Computer’s Enterprise System Division. ‘The focus today is clearly on solutions — from imaging to publishing, from financial management to data harvesting.’” Data harvesting is seen as a solution, not yet as secret theft.

However, it was also in 1993 that the US Congress first considered laws on electronic privacy!

An article in Palm Beach Post, Florida, from 23 July 1993 says: “Before the ‘information superhighway’ [ah those were the days!] proceeds much further, it needs rules of the road– including some to protect the privacy of users. Rep. Ed Markey, D-Mass., chairman of the congressional subcommittee with jurisdiction over electronic networks, plans to spend several months exploring and developing legislation that could give Americans a constitutional guarantee of electronic– or ‘cyberspace,’ as it is known– privacy. ‘In this modern era, do we need a new Constitution to protect you from, say, somebody getting your American Express records to find out what you’re buying? Absolutely,’ Markey said.”

I wondered what Ed Markey [a Democrat] might make of the current scandal! It turns out that he was one of the Senators who interviewed Mark Zuckerberg!

A lot has happened between 1993 and 2018 and a lot more in 2018 – but that’s beyond this blog post.

Proposal for a project

When looking at Nexis, it became clear that most news items on ‘data harvesting’ were produced for ‘WebNews’ (737 items) not traditional media. However, some traditional media are also involved in discussing the issue, with The Guardian making the top of the list with 62 items, followed by the MailOnline with 50, the Times of India (Electronic Edition) with 44 and Business Wire with 43. After that it is a very very long tail….

It is not surprising to see The Guardian at the top of the traditional media list, as Carol Cadwalladr, a reporter and  features writer for The Observer (the Sunday edition of The Guardian), broke most of the stories related to Cambridge Analytica.

Now, if one wanted to, one could do a comparison between the portrayal of ‘data harvesting’ in The Guardian and in the MailOnline, but I’d have to leave that to some future time and people (for a similar analysis see an article by Dimi and Nelya!). I’ll just say a few words about The Guardian coverage.

The Guardian first used the phrase a decade ago on September 12, 2008 in an article related to a TV programme: “Identity thieves are out there, trawling the web, but they haven’t reckoned on Becca Wilcox”. The last headline in my corpus of articles (before I drafted this post on 21 April) appeared on 16 April and said: “How many people had their data harvested by Cambridge Analytica?;  Estimates suggest that tens of millions were affected. But, given the potential for data harvesting under Facebook’s previous policies, this may be a small part of the picture”.

Between these two points in time, there seem to have been two political hot spots during which the metaphor was used (just looking at the headlines). One hot spot was around 2013 in the context of the so-called ‘snoopers’ charter’ that was then debated. The other  was around the Cambridge Analytica scandal. In 2013 there was discussion of state surveillance, electronic spying, threats to civil liberties, the end of privacy. Now there is talk of data breaches, even ransacking of data, and of a data war, of privacy becoming a commodity, even an illusion, of an abuse of privacy, of theft and cheating and the issue of control over data.

Of course, there is much more material out there than a few traditional newspapers! Indeed, big data!! How to trawl such data for metaphors is another matter. In the meantime, a comparison between The Guardian and the MailOnline might be a good, bijoux, start. But why should one do this?

Scrutinising a metaphor

Metaphors make us see the world in different ways. They also make the abstract concrete and the invisible visible. That’s good. However, when something as potentially dangerous as data theft is metaphorically framed as ‘data harvesting’, the metaphor obscures and hides what’s happening in the world, rather than revealing it to us. Metaphors such as this one carry values and assumptions that themselves need to be revealed and opened up to closer inspection, indeed, need to be made public.

Image: Pixabay – worldcloud by Maialisa

Posted in Metaphors