You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@metamodel.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/30 19:55:41 UTC
[Metamodel Wiki] Update of "QueryExecutionStrategies" by KasperSorensen
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Metamodel Wiki" for change notification.
The "QueryExecutionStrategies" page has been changed by KasperSorensen:
https://wiki.apache.org/metamodel/QueryExecutionStrategies
Comment:
Added first version of QueryExecutionStrategies page
New page:
This page describes the various strategies for executing queries in MetaModel.
== Native vs greedy execution ==
Of particular interest is to specify in which cases MetaModel can delegate (aka "push down") query execution to a native query engine vs. having to execute the query in memory (often a greedy approach - Java code supplied by MetaModel).
The following table documents the execution capability in specific modules of MetaModel. Each column represents a query type. The query types are:
* Plain FROM: Simple queries of the form 'SELECT y FROM x'. Possible values:
* streaming: The dataset is implemented in a truly streaming fashion.
* paged: The dataset fetches pages/bulks of records.
* in-memory: The dataset has to consume ALL records into memory. This is ineffecient and may cause out of memory issues.
* Simple COUNT: Queries of the form 'SELECT COUNT(*) FROM x'. Possible values:
* native: The module supports a effective native method of getting the count. Some modules also support additional criteria on COUNT queries, e.g. 'SELECT COUNT(*) FROM x WHERE z' which is marked as 'native (incl. WHERE)'.
* greedy: The module has to run through the dataset to do the counting. This is ineffecient but usually has little memory impact.
* Simple WHERE: Are simple WHERE items being delegated natively, or are they evaluated client-side for each record?
* Primary key lookup: Queries that look up records by their primary keys: 'SELECT y FROM x WHERE x.id = 42.
* Groups and aggregates: Are GROUP BY and aggregation functions being delegated natively, or are they calculated in memory?
|| || Plain FROM || Simple COUNT || Simple WHERE || Primary key lookup || Groups and aggregates ||
|| MetaModel-csv || streaming ||<#FFFF00> greedy when exact<BR>native when approximated ||<#FFFF00> client-side ||<#FFFF00> no PK ||<#FF0000> greedy ||
|| MetaModel-jdbc || streaming || native (incl. all variants) || native || native || native ||
|| MetaModel-excel ||<#FFFF00> streaming .xlsx<<BR>>in-memory .xls || native ||<#FFFF00> client-side ||<#FFFF00> no PK ||<#FF0000> greedy ||
|| MetaModel-pojo ||<#FF0000> in-memory || native ||<#FFFF00> client-side ||<#FFFF00> no PK ||<#FF0000> greedy ||
|| MetaModel-couchdb || streaming || native || native || native ||<#FF0000> greedy ||
|| MetaModel-mongodb || streaming || native (incl. WHERE) || native || native ||<#FF0000> greedy ||
|| MetaModel-hbase || streaming || native ||<#FFFF00> client-side* || native ||<#FF0000> greedy ||
|| MetaModel-json || streaming || <#FFFF00> greedy ||<#FFFF00> client-side ||<#FFFF00> no PK ||<#FF0000> greedy ||
|| MetaModel-xml ||<#FFFF00> streaming SAX<<BR>>in-memory DOM || <#FFFF00> greedy ||<#FFFF00> client-side ||<#FF0000> greedy ||<#FF0000> greedy ||
|| MetaModel-elasticsearch || paged || native ||<#FFFF00> client-side* ||<#FF0000> greedy* ||<#FF0000> greedy ||
|| MetaModel-salesforce || paged || native (incl. WHERE) || native || native ||<#FF0000> greedy* ||
|| MetaModel-sugarcrm || paged || native || native ||<#FFFF00> greedy ||<#FF0000> greedy ||
* = improvement is possible (even within the scope of MetaModel)