Per Lötstedt, Director of eSSENCE, was interviewed by the Uppsala newspaper, UNT. In an article entitled “The laboratory has moved into the computer” Per took the opportunity of introducing e-Science to the public and describing the eSSENCE research program.The article was published on September 7, 2016 (in Swedish).
If you have a subscription you can read the article here.
“Recommendations on e-infrastructures for next-generation sequencing”
O. Spjuth, E. Bongcam-Rudloff, J. Dahlberg, M. Dahlö, A. Kallio, L. Pireddu, F. Vezzi, and E. Korpelainen,
GigaScience, vol. 5, no. 1, pp. 1-9, 2016.
With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.
The citation for his nomination reads; “Bo Kågström is Professor of Numerical Analysis and Parallel Computing and Director of High Performance Computing Center North (HPC2N) at Umeå University. Kågström has 25 publications in SIAM books and journals. He was a corresponding editor of the SIAM Journal on Matrix Analysis and Applications, was awarded the SIAM/SIAG Linear Algebra Prize in 2000, and has served on multiple SIAM prize committees, among other involvements. Kågström is being honored for contributions to the understanding of matrix pencils and for leadership within the European high performance computing community.”
Enabling integrative cross-biobank research: the SAIL method for harmonizing and linking biomedical and clinical data across disparate data archives
O. Spjuth, J. Hastings, J. Dietrich, J. Heikkinn, N. Pedersen, J. Hottenga, S. Ripatti, P. Burton, I. Fortier, C. van Duijn, E. Wichmann, J. Rung, M. McCarthy, M. Allen, E. Raulo, I. Prokopenko, J. Karvanen, M. Perola, M. Kolz, E. J.C. de Geus, G. Willemsen, P. Magnusson, J-E. Litton, J. Palmgren, M. Krestyaninova, and J. Harris.
European Journal of Human Genetics advance online publication 26 August 2015; doi:10.1038/ejhg.2015.165
A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.
Link to article full text: http://www.nature.com/ejhg/journal/vaop/ncurrent/full/ejhg2015165a.html
In the EU COST Action SeqAhead, researchers lead by eSSENCE researcher Ola Spjuth have conducted a large survey on the use of scientific workflows in data-intensive bioinformatics.
Experiences with workflows for automating data-intensive bioinformatics
Ola Spjuth, Erik Bongcam-Rudloff, Guillermo Carrasco Hernández, Lukas Forer, Mario Giovacchini, Roman Valls Guimera, Aleksi Kallio, Eija Korpelainen, Maciej M Kańduła, Milko Krachunov, David P Kreil, Ognyan Kulev, Paweł P. Łabaj, Samuel Lampa, Luca Pireddu, Sebastian Schönherr, Alexey Siretskiy and Dimitar Vassilev
Biology Direct 2015, 10:43 doi:10.1186/s13062-015-0071-8
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.
Link to full text: http://www.biologydirect.com/content/10/1/43
Software X: A new open-access journal from Elsevier for scientific software. The journal publishes peer-reviewed software and also allows for post-publication updates and metadata.
Two presentations from eSSENCE scientists Anders Hast and Torbjörn Nordling from Uppsala University have been accepted at the “11th IEEE International Conference on e-Science” in Munich, Germany, 31/8 – 4/9 2015.
Welcome to the eSSENCE conference “Swedish e-Science Academy”:
Date: 14-15 October, 2015 (lunch to lunch)
Researchers affiliated with UPPMAX and financed by the Strategic Research Program eSSENCE published a report on the applicability of Hadoop for analyzing next-generation sequencing data.
Siretskiy A, Sundqvist T, Voznesenskiy M, Spjuth O.
A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.
Gigascience. 2015 Jun 4; 4:26. doi: 10.1186/s13742-015-0058-5
The Swedish e-Science Education graduate school (SeSE) now announces its graduate courses of autumn 2015.
The courses are listed at the SeSE website:
SeSE gives basic training in fields where the use of e-Science is emerging and where such education can have an immense impact on the research as well as advanced training for students in fields that are already computer-intensive. The course curriculum is tailored to meet a broad set of prerequisites and will foster collaborations between Swedish researchers, possibly opening up for new research fields utilising e-Science tools and methods and also making it possible for inter- disciplinary collaborations on a Nordic level.