May 23, 2017, by Stefan Rennick-Egglestone

Development of a computational platform to support the use of Genome Wide Association Studies

Post by Dr. Philip Quinlan, Advanced Data Analysis Centre (ADAC)

The Advanced Data Analysis Centre (ADAC) provides world-leading data analysis expertise, and can assist in developing analytical processes and pipelines. We are currently working in conjunction with the ‘Crops for the Future’ Research Centre at the University of Nottingham’s Malaysia Campus to develop a computational platform to support use of Genome Wide Association Studies (GWAS).

Our intention is develop an analytical and computational pipeline which takes advantage of HPC resources where possible to streamline research processes, emphasising ease of use, reproducibility and ease of training. However, such a task is not simply about how to build the infrastructure but how to support and maintain the infrastructure long-term and it comes back to the age-old debate about in-house or out-source.

The decision is not quite as black and white as those options and the optimal solution is probably a mix of the two. There are in-house or locally developed solutions to the challenge, equally, there are commercial options that have the potential to get at least part of the way quicker. This case study will use this problem statement to try and understand what is the ideal mix of in-house versus out-source.

Our thoughts so far
The research arena is dynamic and those at the ‘coal-face’ will always have a much greater appreciation of the problem. From this can come some very useful solutions to the problem, like the iHub cyber research environment, currently being imported to the University of Nottingham by Professor David Salt. In this environment, the ability for in-house teams to be brought into the research environment to also experience the challenges and opportunities is of immense value. This does require that the ‘in-house’ team are not simply within the University but embedded across the University so that they can engage and interact with the specialists rather than being technically in-house but in-reality hidden away. This reflects that solutions are not always technical, the ability for direct and ‘in the moment’ conversations with the research team can immensely add to mutual understanding and to the successful delivery of the pipeline. This informality in approach and engagement builds the necessary personal relationships, something that is often overlooked in technical challenges, but is vital to the successful delivery and re-enforces the need for local embedded expertise.

The local expertise has a clear role to play but there is always going to be a transition between very specific local needs to a more general framework for the University. This case study has a relatively specific description, yet, if this can be shown to work in this one area there should be no reason why the solution cannot be scaled across all GWAS related analyses, regardless of disease and the model of working should be exported to wider –omics analyses. The challenge is that those focused on delivering a solution to solve a specific solution are not always best placed to view the wider picture. Equally, when looking to scale from specific to general, the early technology choices can have a large impact. This is where we want to work with our technical partners to explore how existing large-scale external systems can be used to supplement the local solutions but ensure there is sufficient capability for growth once we seek to move on from this specific use-case.

The use-case therefore, has the ability to answer a specific challenge, and demonstrate the valuable contribution that local embedded teams can deliver to the research environment. But importantly, we will seek to demonstrate how wider strategic technology partnerships can deliver the ability to scale and apply these lessons learned across a wider research portfolio.

Posted in Data AnalyticsHigh Performance Computing