September 21, 2016, by Lindsay Brooke
In praise of ‘small astronomy’.
A blog by Michael Merrifield, Professor of Astronomy in the School of Physics and Astronomy.
A number of years back, I had the great privilege of interviewing the Dutch astronomer Adriaan Blaauw for a TV programme. He must have been well into his eighties at the time, but was still cycling into work every day at the University of Leiden, and had fascinating stories to tell about the very literal perils of trying to undertake astronomical research under Nazi occupation; the early days of the European Southern Observatory (ESO) of which he was one of the founding figures; and his involvement with the Hipparcos satellite, which had just finished gathering data on the exact positions of a million stars to map out the structure of the Milky Way.
When the camera stopped rolling and we were exchanging wind-down pleasantries, I was taken aback when Professor Blaauw suddenly launched into a passionate critique of big science projects like the very one we had been discussing. He was very concerned that astronomy had lost its way, and rather than thinking in any depth about what new experiments we should be doing, we kept simply pursuing more and more data. His view was that all we would do with data sets like that produced by Hipparcos would be to skim off the cream and then turn our attention to the next bigger and better mission rather than investing the time and effort needed to exploit these data properly. With technology advancing at such a rapid pace, this pressure will always be there – why work hard for many months to optimise the exploitation of this year’s high-performance computers, when next year’s will be able to do the same task as a trivial computation? Indeed, the Hipparcos catalogue of a million stars is even now in the process of being superseded by the Gaia mission making even higher quality measurements of a billion stars.
Of course there are two sides to this argument. Some science simply requires the biggest and the best. Particle physicists, for example, need ever-larger machines to explore higher energy regimes to probe new areas of fundamental physics. And some results can only be obtained through the collection of huge amounts of data to find the rare phenomena that are buried in such an avalanche, and to build up statistics to a point where conclusions become definitive. This approach has worked very well in astronomy, where collaborations such as the Sloan Digital Sky Survey (SDSS) have brought together thousands of researchers to work on projects on a scale that none could undertake individually. Such projects have also democratized research in that although the data from surveys such as SDSS are initially reserved for the participants who have helped pay for the projects, the proprietary period is usually quite short so the data are available to anyone in the World with internet access to explore and publish their own findings.
Unfortunately, there is a huge price to pay for these data riches. First, there is definitely some truth in Blaauw’s critique, with astronomers behaving increasingly like magpies, drawn to the shiniest bauble in the newest, biggest data set. This tendency is amplified by the funding of research, where the short proprietary period on such data means that those who are “on the team” have a cast iron case as to why their grant should be funded this round, because by next round anyone in the World could have done the analysis. And of course by the time the next funding round comes along there is a new array of time-limited projects that will continue to squeeze out any smaller programmes or exploitation of older data.
But there are other problems that are potentially even more damaging to this whole scientific enterprise. There is a real danger that we simply stop thinking. If you ask astronomers what they would do with a large allocation of telescope time, most would probably say they would do a survey larger than any other. It is, after all, a safe option: all those results that were right at the edge of statistical significance will be confirmed (or refuted) by ten times as much data, so we know we will get interesting results. But is it really the best use of the telescope? Could we learn more by targeting observations to many much more specific questions, each of which requires a relatively modest investment of time? This concern also touches on the wider philosophical question of the “right” way to do science. With a big survey, the temptation is always to correlate umpteen properties of the data with umpteen others until something interesting pops out, then try to explain it. This a posteriori approach is fraught with difficulty, as making enough plots will always turn up a correlation, and it is then always possible to reverse engineer an explanation for what you have found. Science progresses in a much more robust (and satisfying) way when the idea comes first, followed by thinking of an experiment that is explicitly targeted to test the hypothesis, and then the thrill of discovering that the Universe behaves as you had predicted (or not!) when you analyse the results of the test.
Finally, and perhaps most damagingly, we are turning out an entire generation of new astronomers who have only ever worked on mining such big data sets. As PhD students, they will have been small cogs in the massive machines that drive these big surveys forward, so the chances of them having their names associated with any exciting results are rather small – not unreasonably, those who may have invested most of a career in getting the survey off the ground will feel they have first call on any such headlines. The students will also have never seen a project all the way through from first idea on the back of a beer mat through telescope proposals, observations, analysis, write-up and publication. Without that overview of the scientific process on the modest scale of a PhD project, they will surely be ill prepared for taking on leadership roles on bigger projects further down the line.
I suppose it all comes down to a question of balance: there are some scientific results that would simply be forever inaccessible without large-scale surveys, but we have to somehow protect the smaller-scale operations that can produce some of the most innovative results, while also helping to keep the whole endeavour on track. At the moment, we seem to be very far from that balance point, and are instead playing out Adriaan Blaauw’s nightmare.
I don’t disagree with this article’s premise, but there are a couple of problems that should be pointed out. First, Hipparcos was deactivated in 1993 and Gaia, the follow-up mission, didn’t launch until 2013 – 20 years later! So that may not be the best example of the “shiniest bauble” principle. On the flip side, NASA has done an excellent job of juicing the data from Kepler, another big data mission. There is real value in small astronomy, but not necessarily because big astronomy hasn’t done its job properly. In fact, I’d argue the two go hand in hand.
Second, in order to invest “the time and effort needed to exploit these data properly”, some PhD students will end up concentrating on data analysis rather than writing proposals and observing at a telescope. It’s perhaps unfair to make the statement that the time and effort must be expended to fully make use of the data, but then criticize the very people who are expending that time and effort for not having enough experience with other (in some cases irrelevant) skills.