You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@metamodel.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/30 19:55:41 UTC

[Metamodel Wiki] Update of "QueryExecutionStrategies" by KasperSorensen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Metamodel Wiki" for change notification.

The "QueryExecutionStrategies" page has been changed by KasperSorensen:
https://wiki.apache.org/metamodel/QueryExecutionStrategies

Comment:
Added first version of QueryExecutionStrategies page

New page:
This page describes the various strategies for executing queries in MetaModel.

== Native vs greedy execution ==

Of particular interest is to specify in which cases MetaModel can delegate (aka "push down") query execution to a native query engine vs. having to execute the query in memory (often a greedy approach - Java code supplied by MetaModel).

The following table documents the execution capability in specific modules of MetaModel. Each column represents a query type. The query types are:

 * Plain FROM: Simple queries of the form 'SELECT y FROM x'. Possible values:
  * streaming: The dataset is implemented in a truly streaming fashion.
  * paged: The dataset fetches pages/bulks of records.
  * in-memory: The dataset has to consume ALL records into memory. This is ineffecient and may cause out of memory issues.
 * Simple COUNT: Queries of the form 'SELECT COUNT(*) FROM x'. Possible values:
  * native: The module supports a effective native method of getting the count. Some modules also support additional criteria on COUNT queries, e.g. 'SELECT COUNT(*) FROM x WHERE z' which is marked as 'native (incl. WHERE)'.
  * greedy: The module has to run through the dataset to do the counting. This is ineffecient but usually has little memory impact.
 * Simple WHERE: Are simple WHERE items being delegated natively, or are they evaluated client-side for each record?
 * Primary key lookup: Queries that look up records by their primary keys: 'SELECT y FROM x WHERE x.id = 42.
 * Groups and aggregates: Are GROUP BY and aggregation functions being delegated natively, or are they calculated in memory?

||                         || Plain FROM                                     || Simple COUNT                                           || Simple WHERE           || Primary key lookup || Groups and aggregates     ||
|| MetaModel-csv           || streaming                                      ||<#FFFF00> greedy when exact<BR>native when approximated ||<#FFFF00> client-side   ||<#FFFF00> no PK     ||<#FF0000> greedy           ||
|| MetaModel-jdbc          || streaming                                      || native (incl. all variants)                            || native                 || native             || native                    ||
|| MetaModel-excel         ||<#FFFF00> streaming .xlsx<<BR>>in-memory .xls   || native                                                 ||<#FFFF00> client-side   ||<#FFFF00> no PK     ||<#FF0000> greedy           ||
|| MetaModel-pojo          ||<#FF0000> in-memory                             || native                                                 ||<#FFFF00> client-side   ||<#FFFF00> no PK     ||<#FF0000> greedy           ||
|| MetaModel-couchdb       || streaming                                      || native                                                 || native                 || native             ||<#FF0000> greedy           ||
|| MetaModel-mongodb       || streaming                                      || native (incl. WHERE)                                   || native                 || native             ||<#FF0000> greedy           ||
|| MetaModel-hbase         || streaming                                      || native                                                 ||<#FFFF00>  client-side* || native             ||<#FF0000> greedy           ||
|| MetaModel-json          || streaming                                      || <#FFFF00> greedy                                       ||<#FFFF00>  client-side  ||<#FFFF00> no PK     ||<#FF0000> greedy           ||
|| MetaModel-xml           ||<#FFFF00> streaming SAX<<BR>>in-memory DOM      || <#FFFF00> greedy                                       ||<#FFFF00>  client-side  ||<#FF0000> greedy    ||<#FF0000> greedy           ||
|| MetaModel-elasticsearch || paged                                          || native                                                 ||<#FFFF00>  client-side* ||<#FF0000> greedy*   ||<#FF0000> greedy           ||
|| MetaModel-salesforce    || paged                                          || native (incl. WHERE)                                   || native                 || native             ||<#FF0000> greedy*          ||
|| MetaModel-sugarcrm      || paged                                          || native                                                 || native                 ||<#FFFF00> greedy    ||<#FF0000> greedy           ||

* = improvement is possible (even within the scope of MetaModel)