You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/03/27 05:59:47 UTC
[jira] [Closed] (DRILL-5105) Query time increases exponentially with increasing nested levels

     [ https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers closed DRILL-5105.
------------------------------

> Query time increases exponentially with increasing nested levels
> ----------------------------------------------------------------
>
>                 Key: DRILL-5105
>                 URL: https://issues.apache.org/jira/browse/DRILL-5105
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.9.0
>         Environment: 3 Node Cluster with default memory and configurations. 
>            Reporter: Abhishek Girish
>            Assignee: Chunhui Shi
>              Labels: ready-to-commit
>
> The time taken to query any JSON dataset depends on number of nested levels within the dataset. Also, increasing the complexity of the dataset further impacts the execution time. 
> Tabulated below is cached query execution times for a simple select * query over two simple forms of JSON datasets: 
> || No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
> |1	           |0.22                          |0.27                          |
> |2		   |0.23		             |0.25                          |
> |4		   |0.24		             |0.22                          |
> |8		   |0.22		             |0.23                          |
> |16		   |0.34		             |0.48                          |
> |24		   |25.76		             |72.51                        |
> |26		   |103.48		             |289.6                        |
> |28		   |336.12		             |1151.94                    |
> |30		   |1342.22		     |4586.79                    |
> |32		   |5360.2		             |Expected: ~20k        |
> The above table lists query times for 20 different JSON files, 10 belonging to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number of nested levels within them vary as mentioned in the "No. Levels" column. 
> It appears that the query time almost doubles with addition of a nested level (note that in the table above, it translates to almost 4x across levels starting 24) 
> The below two are the representative datasets, showcasing simple JSON structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
>   "level1": {
>     "field1": "a",
>     "level2": {
>       "field1"": "b",
>       ...
>     }
>   }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
>   "level1": {
>     "field1": ""a",
>     "field2": {
>       "nfield1": true,
>       "nfield2": 1.1
>     },
>     "level2": {
>       "field1": "b",
>       "field2": {
>         "nfield1": false,
>         "nfield2": 2.2
>       },
>       ...
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)