You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2016/01/19 16:09:39 UTC
[jira] [Updated] (DRILL-4279) Improve query plan when no column is
required from SCAN
[ https://issues.apache.org/jira/browse/DRILL-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni updated DRILL-4279:
------------------------------
Summary: Improve query plan when no column is required from SCAN (was: The plan is either confusing or could lead to execution problem, when no columns is required from SCAN)
> Improve query plan when no column is required from SCAN
> -------------------------------------------------------
>
> Key: DRILL-4279
> URL: https://issues.apache.org/jira/browse/DRILL-4279
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> When query does not specify any specific column to be returned SCAN, for instance,
> {code}
> Q1: select count(*) from T1;
> Q2: select 1 + 100 from T1;
> Q3: select 1.0 + random() from T1;
> {code}
> Drill's planner would use a ColumnList with * column, plus a SKIP_ALL mode. However, the MODE is not serialized / deserialized. This leads to two problems.
> 1). The EXPLAIN plan is confusing, since there is no way to different from a "SELECT * " query from this SKIP_ALL mode.
> For instance,
> {code}
> explain plan for select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> 00-03 Project($f0=[0])
> 00-04 Scan(groupscan=[EasyGroupScan [selectionRoot=file:/Users/jni/work/data/yelp/t1, numFiles=2, columns=[`*`], files= ...
> {code}
> 2) If the query is to be executed distributed / parallel, the missing serialization of mode would means some Fragment is fetching all the columns, while some Fragment is skipping all the columns. That will cause execution error.
> For instance, by changing slice_target to enforce the query to be executed in multiple fragments, it will hit execution error.
> {code}
> select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error parsing JSON - You tried to start when you are using a ValueWriter of type NullableBitWriterImpl.
> {code}
> Directory "t1" just contains two yelp JSON files.
> Ideally, I think when no columns is required from SCAN, the explain plan should show an empty of column list. The MODE of SKIP_ALL together with star * column seems to be confusing and error prone.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)