8. Database querying

DbTool is the part of rWeb that communicates with the rPredictor database, rDB. It is by far the most complex tool in rWeb and merits a detailed explanation.

8.1. Access to database

As noted in Implementation overview, rWeb uses the repository pattern to access/manipulate data.

The database queries rely on a specific Repository, which in turn relies on a Nette Database layer. The BaseRepository contains several default methods for data querying and manipulating, which may be used in every repository. Other repositories extend this BaseRepository and contain methods specific for their purposes. Each repository has its own table which serves as main table while querying; however, this table can be changed via the getTable($tableName) method.

The Nette Database layer is extended with new reflection class DiscoveredViewReflection located in BaseModule\Database. Reason for this is that the standard DiscoveredReflection provided by Nette cannot handle views, which are used in rDB. Therefore, DiscoveredViewReflection provides some extra functionality to make views work with Nette Database layer.

The appropriate connection must be configured in the Nette config file (app/BaseModule/config.neon); see the Nette configuration manual. The pertinent directive is the database dsn string.

8.2. How does DbTool work?

A core idea is to put all parameters (defined by wantedParameters) in “buckets” defined by database criteria - e.g. equals, greater than, like. These criteria are defined at the top of the class and their names start with the prefix RULE_ (all of them are also stored in a protected variable criteria). When the tool has user values and DB keys in proper values, it is then easy to construct the resulting SQL query.

This should make deriving new database search forms like structural feature search easier.

8.2.1. Specifying inputs

The first important part of this task is to define wantedParameters correctly. The important ones from the list in the DbTool class are:

  • limit, which defines the number of results per query; when limit is too big, queries take too long to execute. This number determines how many records retrieved from the database will appear together when the user scrolls down the results.
  • offset works together with limit and by default it always has to be 0 - otherwise the query would not perform search on a whole result set. Offset grows when user scrolls down the results (size of the step is defined by limit).
  • startPosition, stopPosition are present as a possibility to search sequences with precise start and stop positions, and thus identify it uniquely - this is used for inner mechanism like merging sequences from multiple tools (because a sequence with the same accession number can have multiple records with different start and stop in database).
  • aggregateByAccession has similar usage as previous startPosition and stopPosition - it enforces aggregation by accession number, so for each accession number, only the longest sequence is displayed

The next important thing is the constructor (method __construct), which accepts an object of type \Nette\DI\Container (the standard Nette class). This object has the method getHttpRequest(), which contains information about the current HTTP request. The constructor stores this information for later use.

8.2.2. Parsing inputs

The method addCriteria accepts key-value pairs - it first checks, whether key is one of the special reserved words in method checkReservedWord. This method stores limit, offset and aggregate properties for later usage. If the key is not in the reserved words, it checks if it is in the wanted parameters. If so, it calls getCriteriaParam on the pair.

The method getCriteriaParam handles each input key individually. The main idea is to return a pair of the database key (attribute name) and criteria which should be applied for this key (one of the “buckets” we talked about earlier). It also allows change of the value. So, for simple types, their corresponding part of the switch looks like this:

case 'name':
    $key = '%BASE_VIEW%.name';
    $criteria = self::RULE_LIKE;
    break;

This code states that for name, the database key is name in table %BASE_VIEW - that is a magic constant that will be explained later in this section. The criteria is RULE_LIKE and the value does not changes. In SQL it means that statement:

``WHERE %BASE_VIEW%.name LIKE '%$user_value%'``

will be a part of the query.

Then there are more complicated parameters - those that also have modifiers and multipliers. The tool needs to manually load this extra data for specified parameter from HTTP request stored earlier. Consider following code:

case'firstpublished':
$key = '%BASE_VIEW%.first_public';
if ($this->httpRequest->getPost('db_firstpublished_direction') == 'gt') {
    $criteria = self::RULE_GT;
} else {
    $criteria = self::RULE_LT;
}
$this->defineMultipliedRules($key,
      $this->httpRequest->getPost('db_firstpublished_array'),
      $this->httpRequest->getPost('db_firstpublished_direction_array')
                             );
break;

It first states that the database key for firstpublished is in the table %BASE_VIEW% and its name is first_public. Then there is an if-condition checking direction (modifier) of the parameter - whether it’s “less than” or “greater than”. That is enough for adding SQL statement like WHERE %BASE_VIEW% < '$user_value'. But since this input can be multiplied, it additionally calls a helper method defineMultipliedRules.

The method defineMultipliedRules accept key and two arrays as parameters. The first array (called $allValues) holds values itself and the second array (called $allDirections) holds appropriate modificators. Therefore input arrays from example above might look like this:

$array1 = array(0 => '10.10.2010');
$array2 = array(0 => 'gt');

This means that the user wants publication date to be greater than 10. 10. 2010. The method itself then loops over all elements of those arrays (the amount of multiplied items may be unlimited) and adds the values to proper buckets:

if ($allDirections[0] == 'gt') {
    $this->criteria[self::RULE_GT][] = array($key => $allLengths[0]);
} else {
    $this->criteria[self::RULE_LT][] = array($key => $allLengths[0]);
}

The addCriteria method adds values from getCriteriaParam in the very same way.

8.2.3. Execution

The main task of the execute method is to construct the SQL query, loop over its results and add them to the DispatchModule\ResultSet that will be returned.

First, it chooses proper base table in a method chooseBaseTable. The table is actually view joining several tables together and there are multiple views with the same structure but slightly different data. The difference is always in how the data are grouped (e.g. by accession number, organism name...). Therefore the chooseBaseTable method decides which view to use depending on the aggregate properties (previously stored in checkReserverdWords).

After the base table is chosen, the prepareCriteria method is called. This method creates the query (using the repository getTable method) and loops over data of all buckets (SQL rules) and adds the proper SQL rule for each key, replacing the %BASE_VIEW% magic constant by the name of the actual view.

At this moment, the execute method already has completed the SQL query and can loop over its results. Each result is converted to a DispatchModule\Sequence object and additional data are loaded - references, features, xrefs predicted features. The first three are loaded in very similar way - a new SQL query is performed, loading additional data from another table (the table for these properties is in a 1:N relationship). There is a nested loop where needed, like in case of authors:

// load references with authors
$references = $this->repository->getTable('references')->select('id, ...,
                      ..., location')->where('fk_annotation',
                                             $res['fk_annotation']);
$referenceArray = array();
$i = 0;
foreach ($references as $reference) {
    $referenceArray[$i] = $reference->toArray();
    $authors = $reference->related('authors');
    foreach ($authors as $author) {
        $referenceArray[$i]['authors'][] = $author->author['name'];
    }

    $applicants = $reference->related('applicants');
    foreach ($applicants as $applicant) {
        $referenceArray[$i]['applicants'][] = $applicant->applicant['name'];
    }
    $i++;
}
$sequence->loadData(array('references' => $referenceArray));

Since every reference has 0 - N authors, the nested loop is necessary. The same is applied to applicants. The method loadData adds newly gained data to the DispatchModule\Sequence object.

For predicted features, the situation is a bit more complex, since every sequence can have 0 - N predictions and each prediction has 0 - N predicted features in several tables and each predicted feature has 1 - N positions. For predictions there is similar loop over all predictions. For each prediction all six types of features are loaded:

// load predictions
$predictions = $this->repository->getTable('predictions')->select('*')->where('fk_accession',
  $res['fk_accession']);
$predictionArray = array();
foreach ($predictions as $prediction) {
    $structure = $this->repository->getTable('structures')->where(
                  'fk_prediction', $prediction->id)->fetch()->toArray();
    $wholePrediction['structure'] = $structure['structure'];
    $wholePrediction['algorithm'] = $prediction->algorithm;
    $wholePrediction['structure_id'] = $prediction->id;
    $wholePrediction['bulges'] = $this->getPredictionFeatures(
                                            $prediction->id, 'bulges');
    $wholePrediction['foverhangs'] = $this->getPredictionFeatures(
                                            $prediction->id, 'foverhangs');
    $wholePrediction['hairpins'] = $this->getPredictionFeatures(
                                            $prediction->id, 'hairpins');
    $wholePrediction['helices'] = $this->getPredictionFeatures(
                                            $prediction->id, 'helices');
    $wholePrediction['junctions'] = $this->getPredictionFeatures(
                                            $prediction->id, 'junctions');
    $wholePrediction['loops'] = $this->getPredictionFeatures(
                                            $prediction->id, 'loops');
    $wholePrediction['toverhangs'] = $this->getPredictionFeatures(
                                            $prediction->id, 'toverhangs');
    $predictionArray[] = $wholePrediction;
}

The method getPredictionFeatures simply loads all structural features of a given prediction together with their positions type (using LEFT JOIN in SQL) and stores them into an array which then looks like the following:

array(0 => (0, 20, 111),
      1 => (35, 45, 111),
      2 => (65, 85, 121),
      3 => (90, 95, 121)
)

This means there are two predicted features (with id 111 and 121) at two positions each (first one is on 0-20 and 35-45, the other one is on 65-85 and 90-95).

All that remains is to return the result.

8.3. Miscellaneous methods

There are some other methods in DbTool like getCount which works in similar way as execute, except it’s simpler because it does not return actual data but rather a number of records matching selected criteria. All other methods are just helpers (like lastTaxonomySelected, which picks last taxonomy from all of it’s values, or getTaxonomyPath, which returns taxonomy path for specified id).