5. rWeb: the rPredictor website

As has been said in rWeb, the rWeb component of rPredictor serves as a presentation layer for collected and generated data. The web application is written in PHP with the Nette Framework.

Warning

Unless you are familiar with Nette, it may be difficult to understand how individual parts of the rWeb PHP code interact. The Nette documentation is readily available.

Note

A complete reference documentation for rWeb (including class inheritance tree) can be found at http://rpredictor.ms.mff.cuni.cz/reference/.

The core design problem rWeb faces is how to integrate various bioinformatical functionalities and the respective tools with a database and provide easy access through an unified interface to all its capabilities, while being easily extensible at the same time. To this end, we have designed a layered architecture according to the double dispatch principle for query processing. We will now describe the server-side back-end, both query processing logic and implementation of the tool classes, and then briefly describe client-side scripting.

This layered architecture has three major parts:

5.1. Implementation overview

The whole rWeb application is divided into the following namespaces (also called modules):

  • BaseModule contains the configuration of the entire application and base classes from which other classes are inherited: most importantly``BasePresenter`` (check Nette documentation), BaseService (extends Nette Object), BaseRepository (extends Nette Object) and Form (check http://api.nette.org/2.0.15/Nette.Application.UI.Form.html). The form and the Base Presenter classes are extensions of the Nette classes of same name. The module also contains a router, which manages routes for the whole site, and a template for layout. (Routing enables us to use cool URIs that means URIs look like <http://rpredictor.ms.mff.cuni.cz/search>`_ instead of http://rpredictor.ms.mff.cuni.cz/SearchModule/presenters/SearchPresenter.php. More info can be found again in the corresponding Nette Documentation.)

  • DispatchModule is the most important backend module - it contains services for executing proper utilities both for searching and for predicting. This is realized through Parser classes (SearchParser and PredictParser) and tool classes. Tool classes (such as DbTool or BlastTool) are stored in DispatchModule\Tools namespace. Helper classes (currently a class for on-the-fly visualization) are stored in the DispatchModule\Helpers namespace.

    The function of SearchParser is described in Query processing; the tool classes are described in the section Tools.

    Also a part of the module are the classes Sequence and ResultSet, which serve as containers for query results.

  • PredictModule contains the presentation part of prediction logic. It contains the front-end part of the application that shows the prediction input form and prediction results to the user.

  • SearchModule is similar to PredictModule, except it takes care of the search form and search results. It also supports exporting functionality.

  • AnalyseModule is somewhat similar to DispatchModule. It contains presenters - one main presenter (AnalysePresenter) and additionally one presenter for each analytical tool. Models are the most important part of this module and they are placed in models subdirectory. Each model represents one analytical tool.

A class tree is available in the reference documentation.

The application uses the repository pattern for manipulating data. The BaseRepository class provides a default implementation of mechanisms for manipulating the database. The application also keeps a very simple service oriented design, so essentially all classes are used as a service throughout the application. A service typically works with repositories and is used by another service or by a presenter. Injecting a service into another service or a presenter is very simple:

/**
 * @var \DispatchModule\SearchParser
 * @Inject
 */
protected $parser;

All you need to do to inject a service is to state its full path (according to PHP namespaces), add the tag @Inject and the variable name in which the service will be injected. This mechanism is provided by postConstruct methods in BasePresenter and BaseService.

5.2. Query processing

A user’s query is processed by the following pipeline:

Search query processing pipeline

Both the SearchModule and DispatchModule are used. (For a prediction query, there would be PredictionModule and PredictionPresenter instead.) The DispatchModule does the actual query processing work. The SearchModule only contains presenters and templates for filling the search form and displaying results - it gathers user input, sends it to DispatchModule for the “real” processing and displays results that DispatchModule sends back to it.

Diagram of classes used for a query

As has been said earlier, the core part of the query processing pipeline is handled by the DispatchModule. This module handles the user request, performs the requested actions and produces results. Central to its function is the SearchParser class, which manages the whole process. At the disposal of the SearchParser are several classes called tools - classes that actually perform the various search (or other) operations. Each of these tools actually represents one search method, e.g. search in the database or use some tool from external sources like Blast.

Note

We are only talking about searching in this text; the workflow is the same for prediction queries as well, only with PredictParser instead of SearchParser, etc. The search parser is mirrored in the PredictionParser. However, the prediction results are not returned as a ResultSet, but simply as a Dot-paren file string.

The core query processing component, SearchParser:

  • gets data inputs from user,
  • initializes all required tools,
  • gives to each tool the inputs it needs,
  • collects the results from each tool,
  • merges collected results and passes the merged results to the user through the Presenter.

To be able to do this, SearchParser has at its disposal:

  • exactly which search tools are present,
  • which search tools need which input parameters.

While SearchParser has access to information about individual tools, it doesn’t hold the information directly: it relies that the tools to provide information about themselves. The tools do so through the getWantedParameters and requiredParameter methods. This reinforces locality of information: tool-specific information is kept only in the tool itself and is not duplicated elsewhere.

In turn, extending the application by a new tool requires only that the tool implements the getWantedParameters and requiredParameter methods (and of course the ability to return a ResultSet and other requirements. This is ensured by implementing the ToolInterface interface). See Tool implementation.

The data that flows through the DispatchModule from left to right is the input form: key-value pairs that the SearchParser can send to the appropriate tools based on their keys.

The same double-dispatch principle is also used to create the search form - the SearchPresenter uses SearchParser‘s addFormParameters method that returns the whole form with appropriate inputs from all the tools (because SearchParser is “the guy” that acts like he knows all about tools; he doesn’t, but he can ask them).

5.3. Tools

Tools are the “working blocks” of rPredictor and the backbone of its functionality. From this tool-centric perspective, the whole rWeb is just a pretty interface for the tools.

Tools provide different ways of searching (and predicting) above the rPredictor database (rData). In order for rPredictor to be easily extensible, all tools have a common architecture.

A tool needs to provide three functionalities:

  • a definition of its inputs - what information the tool needs to run,
  • the ability to parse these inputs on request from the SearchParser,
  • the execution, which runs the tool with the parsed inputs and returns a ResultSet.

A tool doesn’t need to worry about:

  • how it will be accessed by the user (all HTML generation is handled by the SearchModule (or PredictModule) via Nette),
  • other tools (incl. input naming conflicts, combining results).

You can easily extend the rPredictor toolkit by your own tool. See Creating your own tool.

5.3.1. Tool implementation

Each tool is represented by one class that has to be stored either in the searchTools directory, if it is a tool for searching, or predictTools, if it is a prediction tool (the full path is app/DispatchModule/searchTools or predictTools). Every tool class has to implement the ToolInterface interface. Additionally, it is strongly recommended that the tool class extends the BaseTool class, which contains default implementations of several methods.

The tasks of a tool can be divided into three phases:

5.3.1.1. Specifying inputs

The input part defines which parameters the tool wants from the user. This part is utilized when the SearchParser passes tool information to the SearchPresenter to create the input form and then when the SearchParser dispatches individual input values to tools; it is the “public statement” the tool makes about itself to the layers above it in the query processing pipeline.

The inputs are defined in the wantedParameters array, which has to be structured according to the following example scheme:

array(
    'sequence' => array('text', 'Fill in sequence, or part of sequence'),
    'accession' => array('text', 'Accession number',
                array('multiplicators' => array('or' => 10))
            ),
    'firstpublished' => array('text', 'First published', array(
        'date' => true,
        'modifiers' => array(
            'firstpublished_direction' => array('select', '', array(
                'items' => array(self::RULE_LT => 'before',
                                 self::RULE_GT => 'after')
            )
        )),
        'multiplicators' => array('and' => 2)
    )),
    'region_length' => array('text', 'Length',
         array('modifiers' =>
               array('region_length_direction' =>
                     array('select', '',
                           array('items' =>
                               array(self::RULE_LT => '<',
                                     self::RULE_GT => '>'
                                     )
                           )
                     )
                ),
                'multiplicators' => array('and' => 2)
             )
    ),
);

This example code (simplified from DbTool.php) says the following:

  • The tool wants 4 parameters called sequence, accession, firstpublished and region_length. The SearchParser, when parsing the search form filled in by the user, will know that these 4 parameters should be given to this tool.
  • Each parameter has also its properties defined. The standard way to define a property of an input parameter is by using an array containing two elements - type and label. Types are parallel to standard HTML forms and a complete list of implemented types is in BaseModule/Form (see API reference). Label is a custom text to display next to the input field.

Note

Element names are prefixed with the tool name before generating the html code, so for the sequence parameter in DbTool, the generated HTML code would look like this: <input type="text" name="db_sequence">.

  • Furthermore, each parameter can be customized by additional parameters. These extra parameters are stored in an array which is the third element of the parameter array. They can be forced by input type - like the items element for the select type in the firstpublished_direction parameter from the example above.

    Extra parameters are items for select (dropdown) and value for hidden type.

  • Other additional properties can be modifiers and multiplicators. Modifiers are elements that directly affect the specified inputs - e.g. whether some value should be searched as “less than” or “greater than” as in example above in the region_length input (element region_length_direction) and firstpublished input (element firstpublished_direction).

    • A modifier can be any supported element from BaseModule/Form and it’s built by the same rules as any input described in this section (possibly with it’s own modificators and multiplicators - although it is not recommended).
    • Multipliers serve the purpose of getting multiple values for the same input. A multiplier can be either and or or. Each multiplier further has a cardinality which limits how many times can the multiplier be applied. Basically, all that multipliers do is that they copy input element (including modifiers).

So, in the example above, we can see that accession is a text input and can be copied ten times with the or multiplier - users can add up to 10 accession numbers and the performed search will include results containing any of those accession numbers. The firstpublished parameter is a type of text with a special property date which display a calendar on the textbox. It can be multiplied twice and it has a modifier before or after. That way, querying interval between dates is possible. The region_length parameter is a text input with an additional selectbox determining whether search results should include items with quality greater or lesser than specified value. It can also be multiplied using and rule, which means user can perform queries like “find everything between X and Y”, where X and Y are two values for the multiplied region_length input.

See also

A more complicated tool: Database querying

See also

For a step-by step tool writing tutorial: Creating your own tool

5.3.1.2. Parsing inputs

The input parsing of a tool utilizes the addCriteria($name, $value) method. This method is called from SearchParser, once for each input field, and its purpose is to pass data input by the user into the tool. Through addCriteria, the tool gets key-value pairs to store for the execution phase. See Creating your own tool for a comprehensive example.

Warning

It is important to note that if a tool uses multipliers, multiplied values are not added through this function - it is because of safety reasons enforced by Nette Framework. The form does not know in advance how many inputs of each type the user will send - therefore it cannot perform security checks on these inputs. Also, these added inputs have different names - the “_array[]” postfix is added to the end of input name (the [] brackets are a way of telling the server, that this input is an array and not a single value).

So, the first input is added normally through addCriteria, and that is the best moment to handle all other elements of the array, which has to be done manually (e.g. load the elements via $accessions = $this->completeData['db_accession_array']; - completeData is a helper variable containing the multiplied fields from POST request).

5.3.1.3. Execution

The execution is, from the architectural point of view, the simplest. All that is important is that the execute() method returns an object of type \DispatchModule\ResultSet, which holds a set of sequences. Otherwise it can do basically anything (e.g. run external tool through exec).

For prediction tools, the execution method should return simply a string in the dot-paren format: a FASTA header, then a sequence and last a dot-paren representation of the secondary structure for the given sequence. There is no container object currently implemented for structures.

Note

Tools can use addditional classes stored in models subdirectory of DispatchModule. Those classes are only helpers providing additional functionality for tools. They are separated in own directory mainly because of simplicity and code readability. Typical use case might be writing a separate XmlParser - it’s something that the tool needs for its execution, but it’s not a part of the tool itself.

5.4. Analytical models

Analytical module is somewhat similar to DispatchModule, however instead of tools it uses models. The meaning of the model is similar - each model is one block that contains specific functionality. However, because of the varying nature of the different models and because there is no need to make all models work together, the models are classes with essentially no further requirements. The only requirement is that each model should be available as a Nette service. This is not a strict requirement, however it is convenient that all models will meet it so they can be used in a same way.

Appart from that, models can do whatever is needed. Each model should be used by one presenter (which should have same - or at least similar - name as the model) that can have any number of actions (and therefore any number of templates).

To learn more about analytical models and how you can create your own, see Creating your own analytical model.

5.5. Client side scripting

Client side scripting is done in JavaScript with use of the jQuery scripting library. All javascript files are placed in the webroot/js directory (where webroot is the directory that contains the index.php entry point file, see rWeb). The following subdirectories are there:

  • Classes holds all client side logic. (See Classes)
  • Lib contains all libraries that the project uses (mainly jQuery). See: Lib
  • Langs contains language files; these are used in the UI to display text. See: Langs
  • View holds presenter information - most of the time, these files only hook DOM events to methods implemented in classes. See: Views

Client-side script logic

5.5.1. Javascript files

All javascript files are divided into separated directory except external jQuery libraries that are placed in the main directory. In the following part there will be described how every file is being used and what functionality it contains.

5.5.1.1. Classes

  • Ajax-sequence.js is the main class to visualise the sequence detail. It gets JSON sequence information and regenerates HTML code that is displayed on the screen.
  • Export.js is the key file for exporting sequences. Firstly, the given result set is saved for later exports. Secondly, it creates jQuery dialog form where the export result is shown. And finally the export result is generated. At the moment there are 3 fully working format - CSV, JSON (contain all visible information on the screen) and FASTA/ dot-paren format.
  • TaxonomyBrowser.js displays interactive dropdown-based browser of taxonomic tree - it asynchronously loads data for each taxonomy level and displays new dropdown for each sublevel
  • UI.js works with the user interface. The UI class has three main properties - to work with the form (hides and shows the form, duplicating and deleting input fields), to work with the result set (shows and hides sequence, loads more results by HTTP request and sends an export request) and switches on/off tools.

5.5.1.2. Langs

This directory contains two files Search.js and Predict.js that show a help to the user. Some of sequnce attributes might not be clear to a user - in that case a question tag is shown and a text (defined in this file) is shown.

5.5.1.3. Lib

All used jQuery libraries can be found in this directory.

5.5.1.4. Views

  • Predict.js is the main file to work with the predict page. The task of this file is to redistribute the work to previously mentioned files - like form requirements, sending HTTP requests, loading tools.
  • Search.js is really close to the previous file. It performs the same requirements. In addition its task is also to load more results from the form input and display them. More results can be loaded by clicking on the button or by reaching the bottom of the page.

5.6. Coding standards

The whole application follows Nette Coding standards. However, this is not a strict requirement; more of a recommendation. The application is documented according to PhPdoc conventions and uses ApiGen to build the reference manual.