1. Architecture

The rPredictor infrastructure has several major components:

  • a database of RNA sequences and secondary structures (rData) and Extraction-Transformation-Load mechanisms to build rData (rETL),
  • a set of tools that perform standard tasks on the data like similarity search or secondary structure prediction (rTools),
  • an implementation of a new algorithm dedicated to RNA secondary structure prediction (CP-predict),
  • an internet portal and back-end that makes the rData and rTools components accessible to the research community and general public (rWeb),
  • and finally documentation, split between a User, Technical and API Reference part (rDoc).

A high-level overview of the rPredictor architecture

We will now describe the individual components of rPredictor.

1.1. rData

The rData components is further divided into parts. The major part is the rPredictor POSTGRES database, rDB. The rDB holds the rPredictor dataset - all the information about RNA that are available for searching in rPredictor. Also under the label of rData, databases for individual tools are grouped. These databases do not offer any extra information; they are merely extracted from rData (or directly from its sources) and re-formatted for efficient use by individual tools. The tool that needs this kind of re-formatting of the whole rPredictor dataset is currently Sequence search. The Taxonomy search and Annotations search, on the other hand, queries the database directly. (More on this in the section on rWeb.)

The dataset is generated by combining information external sources (SILVA, Rfam, ENA and Taxonomy-NCBI databases). This process is handled by the rETL component.

A full description of rDB itself is found in The Data of rPredictor.

The process of generating tool-specific representations of the rPredictor dataset is described in the rPredictor setup.

A more high-level description of the dataset is available in the User documentation, in rPredictor data and database.

1.2. rETL

The rETL (Extraction - Transformation - Load) component of rPredictor handles downloading and processing data from various sources in order to populate rDB with the rPredictor dataset. The process has many steps, from automated queries to parallelized processing of secondary structure predictions.

A detailed description of this component can be found in the section The ETL layer of rPredictor.

1.3. rTools

Within rPredictor, numerous external tools are integrated. (“External” here means “not a part of rWeb”.) They provide the “useful functionality” like various methods of similarity search or secondary structure prediction, including auxiliary functions like gluing various input/output formats togehter. The rTools component is a label under which this collection of external (and partially internal) tools is kept.

Warning

Note that there is a different perspective on what a “tool” is from the point of view of the rWeb component. In rWeb, a tool is a PHP class that integrates some search or prediction functionality into the rPredictor website. rTools, on the other hand, is a collection of programs that stand outside the rWeb component.

Not all tools are third-party: under rTools is also grouped Cppredict, a Matlab program that implements CP-predict: a two-phase algorithm for rRNA structure prediction (and some more utilities).

The connections between rTools and other components (rData, rETL and rWeb tool classes) merit further explanation of the nature of these relationships:

  • The rETL component utilizes secondary structure prediction and analysis capability in rTools to obtain structural information about the rPredictor dataset. (This information is not present in the source databases.)
  • The rData component contains tool-specific exports of the dataset that the tools utilize as they run. This includes special databases for similarity search tools or the infrastructure used by CP-predict.
  • Finally, the tool classes of rWeb execute the external tools based on the user’s search/prediction query and collect their results.

An overview of rTools is practically synonymous with the list of requirements of type “install a library/tool/package” in rPredictor setup. Furthermore, as new functionality will be made available through rWeb, the rTools component will grow accordingly.

1.4. rWeb

The rWeb component serves as a presentation layer for collected and generated data. The web application is written in PHP with the Nette Framework.

The rWeb component is the most complex of rPredictor. It is organized into a layered architecture, with client-side scripts on the user end and a pipeline that runs a user’s query through presenters, parsers and finally tools and then the results back to the user.

The detailed description of rWeb design is available in the section rWeb: the rPredictor website.

1.5. rDoc

The rDoc component, rPredictor documentation, is split into three major groups: User, Technical and API references. The User and Technical documentation are generated using the Sphinx library from a central repository. The reference documentation is further split into documentation for individual components, as each component (and in the case of rTools, each sub-component) has its own API reference, often in incompatible formats.

The User and Technical documentation generated in HTML form is integrated directly into rWeb, the API reference for rWeb is available as a part of the rPredictor site as well.

Table Of Contents

Previous topic

2. rPredictor Technical Documentation

Next topic

2. rPredictor setup

This Page