You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/03/27 05:59:47 UTC
[jira] [Closed] (DRILL-5105) Query time increases exponentially
with increasing nested levels
[ https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers closed DRILL-5105.
------------------------------
> Query time increases exponentially with increasing nested levels
> ----------------------------------------------------------------
>
> Key: DRILL-5105
> URL: https://issues.apache.org/jira/browse/DRILL-5105
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - JSON
> Affects Versions: 1.9.0
> Environment: 3 Node Cluster with default memory and configurations.
> Reporter: Abhishek Girish
> Assignee: Chunhui Shi
> Labels: ready-to-commit
>
> The time taken to query any JSON dataset depends on number of nested levels within the dataset. Also, increasing the complexity of the dataset further impacts the execution time.
> Tabulated below is cached query execution times for a simple select * query over two simple forms of JSON datasets:
> || No. Levels || Time (s) Dataset 1 || Time (s) Dataset 2 ||
> |1 |0.22 |0.27 |
> |2 |0.23 |0.25 |
> |4 |0.24 |0.22 |
> |8 |0.22 |0.23 |
> |16 |0.34 |0.48 |
> |24 |25.76 |72.51 |
> |26 |103.48 |289.6 |
> |28 |336.12 |1151.94 |
> |30 |1342.22 |4586.79 |
> |32 |5360.2 |Expected: ~20k |
> The above table lists query times for 20 different JSON files, 10 belonging to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number of nested levels within them vary as mentioned in the "No. Levels" column.
> It appears that the query time almost doubles with addition of a nested level (note that in the table above, it translates to almost 4x across levels starting 24)
> The below two are the representative datasets, showcasing simple JSON structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
> "level1": {
> "field1": "a",
> "level2": {
> "field1"": "b",
> ...
> }
> }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
> "level1": {
> "field1": ""a",
> "field2": {
> "nfield1": true,
> "nfield2": 1.1
> },
> "level2": {
> "field1": "b",
> "field2": {
> "nfield1": false,
> "nfield2": 2.2
> },
> ...
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)