|
The URGI newsletter #4 Sep 2015
|
Editorial
In order to face the current scientific challenges, the development of efficient and robust infrastructure is a strategic issue for bioinformatics platforms. The URGI has a computer cluster able to meet the most demanding requirements. However, we must now face an influx of smaller projects, each less demanding, but by their numbers exceeding our capabilities. To cope, it is necessary to pool IT resources between major bioinformatics platforms. This is one of the objectives of the "investment for the future" ReNaBi-IFB project and the building of INRA Datacenters. These projects to which we contribute, build very large storage and computing capacities. In these infrastructures, we plan to distribute bioinformatics environments "turnkey" to allow research units to cope, by themselves, with the arrival of quantities of sequences from new sequencing technologies. These national infrastructures will host state of the art of free tools able to decrypt the information contained in sequenced genomes. Cloud computing technologies have reached today a maturity stage, making them interesting solution to meet this challenge. It allows on-demand access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services). The key features are the scalability and the rapidity of the delivery, as resources are provided with minimal management effort. The technology behind eases the pooling of resources between heterogeneous computer infrastructures (and then bioinformatic platforms), masking its complexity to the end-users. It will also enable to better adapt the computing demand to the capacity, as scalability allows to fine tune the resource to the real needs. This new paradigm offers an elegant way to provide, on-demand, the required computing resources. The new IT infrastructures that are currently under construction in INRA Datacenters and the Renabi-IFB Datacenter will implement this model. We expect an improved efficiency at managing the resources, but also need to adapt our tools to take benefits of all the advantages it offers. This technology also offer a new paradigm to software distribution and conception. This “hardware shift” has of course consequences in the way of building and executing analysis but has also strong consequences in the way of building and distributing tools dedicated for these analyses. Our first attempt to embrace these hardware and software shift have been to build two “cloud ready” prototypes. One is a virtualized version of one of our complex pipeline, the second is the use of virtualization technologies for a new Galaxy server. These prototypes already gave us elements that we could apply to our whole development process. This experience leads us to plan to use a practice from the internet companies known as continuous delivery, to automate and improve the process of software delivery. Techniques such as automated testing, continuous integration and continuous deployment are used to produce high standard and easily packaged/deployed software. This results in an improved ability to rapidly, reliably and repeatedly release new enhanced versions to users.
|
Publications
|
News
|
Events
|
URGI bioinformatic platform call for proposals
URGI bioinformatic platform opened a call for proposals aiming at supporting the teams from the BAP, SPE and EFPA INRA divisions for their high throughput sequence analysis. We give access to our computing resources through linux command line, or our Galaxy server for 3 to 6 months. We also organize training sessions on demand (for Cluster and Galaxy usage). Applications can be submitted throughout the year. To know more about the assessment of the proposals, the criteria of eligibility and the submission form please have look at
https://urgi.versailles.inra.fr/Platform/Call-for-proposals
. In 2015 we accepted 9 projects concerning various analysis such as Repeats, RNA-seq, metagenomics, genetic map, and organized 3 training sessions.
Exploring GWAS data using GnpIS-Asso
URGI develops and maintains an information system called GnpIS for plants of agronomical interest (D. Steinbach et al., Database Journal 2013, doi:10.1093/database/bat058). This information system stores genetic resources (collections and panels), phenotypes, genetic maps, QTLs, genome annotation data and expression data. We recently developed, in the frame of the GnpAsso ANR project (2011-2014, Coord. D. Steinbach INRA URGI), a complete workbench to manage and exploit results from association genetic studies in plants (see also
GnpAsso project page
for detailed information). This new workbench is composed of 3 components that can work synergistically or independently:
- A new component called
GnpIS-Asso
to store and mine into association data
-
SniPlay
(A. Dereeper et al NAR Web server 2015, doi: 10.1093/nar/gkv351), a tool initially developed for diversity analysis and extended for GWAS analysis. A Sniplay Galaxy workflow for GWAS is also available in the Galaxy toolshed.
- ThaliaDB, a database for the management of GWAS related experimental data, was improved in this project to export automatically to the GnpIS-Asso component and to Sniplay.
The GnpIS-Asso component developed by URGI (D. Steinbach et al. in prep) allows scientists:
- to mine into associations between markers and phenotypes, derived from multiple GWAS experiments, using different selection filters: phenotypic traits, genetic resources or panel, SNPs markers, or group of markers such as those of a genotyping arrays.
- to represent in table sheets and graphics, associations (Manhattan plots, QQPlots, BoxPlots), filtering by modality, site, year, chromosome, statistic method, values.
- to view the best associations in the context of gene annotation, to find gene candidates (Jbrowse tool)
- to export filtered datasets in order to launch complementary external analysis tools (Sniplay, Galaxy, ..)
GnpIS-Asso is interoperable with other GnpIS component such as the phenotyping, the polymorphism, or the genetic resources components. Hence it links association data with their corresponding raw data (phenotypes, genotypes). In addition, GnpIS-Asso is extended to host genomic selection data, contributing to the development of a new component: GnpIS-SelGen (D. Steinbach INRA Coord. Funding: Meta-program SelGen INRA). GnpIS-Asso contains today, 2 public datasets (genetic resources, genotyping, phenotyping and association data), provided by 2 partners (INRA GQE - Le Moulon and GAFL). One data set is related to maize flowering (Bouchet et al. Plos One 2013, doi: 10.1371/journal.pone.0071377), and the other to tomato fruit metabolism (Sauvage et al., Plant Physiology 2014, doi: 10.1104/pp.114.241521). The tool will be also used by 5 major French national projects on crops
Breedwheat
,
AmaiZing
,
Rapsodyn
,
PeaMUST
,
Aker
. To test, go to the
GnpIS portal
, or use
this direct link
. For more information, contact: urgi-contact [at] versailles.inra.fr
URGI goes further in virtual machines
URGI is in the process of migrating all its applications in virtual machines to improve the reliability of its services. In Autumn, our tools in production (including GnpIS, browsers, Intermines) will be hosted in virtual machines. It will allow us more flexibility in their management, but also prepare us to move our service infrastructure to a datacenter to still improve the quality of services.
Training
URGI organizes training sessions for biologists and computer scientists on:
- URGI in-house tools (analysis tools, databases)
- Tools and methods externally developed, in relation to its expertise
- the use of URGI bioinformatics platform cluster, in relation with its call for projects.
Sessions are organized either at URGI (INRA Versailles center) or in the INRA centers of the participants. Some sessions are organized in collaboration with other platforms or initiatives (national or international). In the latter case, the agenda is fixed according to user’s needs. The duration of training sessions ranges from one to two days depending on the needs. Training material (slides, videos, manuals, …) is provided to participants. User’s feedbacks are collected and their analysis allows continuous improvements of the sessions. Since june 2015, a complementary type of training has been proposed: webinars. It consists in short time training sessions (maximum 2 hours), focusing for instance on the new functionality of a tool. It was done by using the
INRA e-learning platform virtual classroom
tool. The different training sessions that have been proposed since 2008 are available
here
. During the last 2 years, were organized:
- 4 sessions on GnpIS information system at national or international level with a focus on the new functionalities developed around the exploring of genome wide association data, genomic selection, genotyping, phenotyping data and RNASeq expression. Data were provided by scientists.
- 2 sessions on REPET software.
Short-term 2015 perspectives are to organize:
- one webinar (in French) on maize RNASeq data exploration in GnpIS (September 25th).
- a training session (in French) at Rennes, dedicated to the rapeseed data in GnpIS (November 3rd).
- a training session (in French) at Versailles, on data submission and data exploration with GnpIS (December 10th).
Others webinars are planed focusing on Wheat data in GnpIS in the beginning of year 2016. To register, ask for details or propose new training sessions, please send an email to urgi-training [AT] versailles.inra.fr
Insights into genome and epigenome evolution in Brassicaceae
Arabidopsis thaliana has been the first plant to have its nuclear genome sequenced which has proven being an essential resource for the study of plant genomes and plant epigenomes. This robust ground has fostered the sequencing and assembly of several closely related species of the family Brassicaceae in the scope of comparative genomic and evolutionary studies. Over a dozen of Brassicaceae genomes are available to date, including those from plants of agricultural and/or biological interests. The URGI research team is especially interested in the characterization of genomic repetitive elements (i.e. mostly transposable elements, TEs) and their impact on plant genome evolution. Therefore, we are taking advantage of the Brassicaceae genomes in the scope of addressing the long-term impact of TEs in plant genomes. Using genomic data from six Brassicaceae species, we have recently established that most of the Arabidopsis repeated DNA is evolutionary old and that old and younger repeats are enriched in different compartments of this genome. Building upon this observation we determined that the DNA methylation of repeated sequences can last over prolonged periods of time (i.e. millions of years) and that a mutational bias in aging repetitive elements correlates with a bias in DNA composition along the chromosomes, with an impact on epigenomic landscapes. Overall, our work established that the ancient proliferation of repeat families has long-term consequences on plant biology and genome composition. These results are highly significant in the context of the identification of epigenome-associated QTLs and translational research, and will help addressing the epigenetic impact of TEs on plant adaptation and domestication in an evolutionary perspective. Repetitive DNA accumulates mutations and deletions over time and thereby becomes increasingly difficult to detect. Because repetitive elements appear to be evolutionary old in A. thaliana, we addressed whether sensitive approaches could reveal more DNA of repetitive origin in this genome. Using a series of innovative strategies, we found that a significant amount of the non-annotated DNA in A. thaliana (a.k.a. genomic dark matter) is probably of repetitive origin. Altogether, our work enables to better understand the origin of some genomic dark matter in plants, which nature and function remains largely cryptic. Indeed, our results show that part of the A. thaliana genomic dark matter forms a continuum with repetitive DNA and suggest that another part, that remains beyond detection possibilities, is likely to be of similar origin (albeit more ancient). Our results therefore suggest that besides the detectable fraction, TEs have played a major role in the evolution of plant genome size and composition and probably in the emergence of genes and regulatory elements as well, thereby renewing our perception of the possible impacts of TEs on plant genome evolution. See articles:
Maumus et al.
, Ancestral repeats have shaped epigenome and genome composition for millions of years in Arabidopsis thaliana. Nat. Comm 2014
Maumus et al.
, Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter. PLoS One 2014
|
To unsubscribe from this newsletter please visit the following link:
unsubscribe
© 2024 http://urgi.versailles.inra.fr/
|
|