June 1, 2017, by Stefan Rennick-Egglestone
Data management planning for a signature research centre
Scientific equipment can now regularly produce what we might perceive enormous volumes of data. Gene sequencers or imaging system that can produce 200 gigabytes in a day are already in common use at the University of Nottingham, and equipment already exists which can produce 100 terabytes in a day. Data volumes are always likely to increase, as offering higher resolution images, or larger numbers of samples will always be a mechanism by which an equipment manufacturer can distinguish themselves. Researchers will always be able to find new questions that can be answered by more advanced equipment, and buying the most advanced equipment is often a reasonable tactic for supporting the success of research.
It’s important to note that researchers need to plan how to manage data captured from scientific equipment. There’s no guarantee that a very high throughput sequencer, for example, can just be connected to the network and be expected to work at full capacity, which is particularly important to understand if a researcher is committing themselves to producing certain outputs by submitting a funding proposal. Institutional network and storage technologies have a finite capacity, which is shared with other users. Advanced equipment which pushes at the boundaries of existing institutional capabilities may need interim plans up until institutional provision catches up, or until alternative options are provided.
COMPARE is a new research centre, and hence it has been an ideal time to work with COMPARE leadership to model their future research data management needs, and to plan for solutions. The aim is to develop some generic “patterns” which can be re-used across other producers of large volumes of research data. COMPARE is an interesting case study, as it is committed to supporting shared usage two pools of advanced imaging systems, one at each university.
Efforts are principally focussing on three principle areas:
- modelling future network usage and solving any bottlenecks identified
- modelling future storage requirements
- identifying strategies for funding infrastructure
Outcomes will be summarised in blog posts over the next few months.