You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Sudheesh Katkam <sk...@maprtech.com> on 2015/09/25 06:32:17 UTC

Improving Performance of SELECT * FROM hive.table LIMIT 0

Hey y'all,

### Short Question:

How do we improve performance of SELECT * FROM plugin.table LIMIT 0?

### Extended Question:

While investigating DRILL-3623 <https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to see where we spend time for SELECT * FROM hive.table LIMIT 0 query.

## Setup:
Copy the drill/sample-data/region.parquet (x 20000) into a DFS (MapR-FS in my case) directory named region. Create a Hive external table pointing to region. Run Drill with default configuration.

## Now there are two ways to query this table:

> SELECT * FROM hive.region LIMIT 0;
+--------------+---------+------------+
| r_regionkey  | r_name  | r_comment  |
+--------------+---------+------------+
+--------------+---------+------------+
No rows selected (1203.179 seconds)
...

> SELECT * FROM dfs.test.region LIMIT 0;
+--------------+---------+------------+
| r_regionkey  | r_name  | r_comment  |
+--------------+---------+------------+
+--------------+---------+------------+
No rows selected (94.396 seconds)

Currently, we use HiveRecordReader for the first case and ParquetRecordReader in the second case. With DRILL-3209 <https://issues.apache.org/jira/browse/DRILL-3209>, both queries will use ParquetRecordReader. However, for formats that are non-native to Drill or other storage plugins, we still face this problem. Summarizing the query profile,
+-------+-----------+---------------+----------------+
| Query | Fragments | Planning time | Execution time |
+-------+-----------+---------------+----------------+
| hive  | 1         | ~2 min        | ~18 min        |
| dfs   | 1         | ~1 min        | ~33 sec        |
+-------+-----------+---------------+----------------+

## The time hogs:

# Planning time in both cases needs to improve. How?

# With respect to execution, in the first case ImplCreator.getExec(…) call in the FragmentExecutor took 1,070,926 ms to create RecordBatch tree. There are 20,000 readers being initialized in HiveScanBatchCreator. How do we avoid this? What are the implications of chained impersonation (opening readers in ctor() rather than in setup())?

### Extending further:

This can be generalized to any "LIMIT n" query with n is a small number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster than LIMIT 0. However there is a sweet "n" after which parallelization hurts.

###

Thank you,
Sudheesh

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Sudheesh Katkam <sk...@maprtech.com>.

Hey y’all,

I see that DRILL-1617 <https://issues.apache.org/jira/browse/DRILL-1617> disables producer-consumer because of correctness issues. Should we enable this visitor (as Venki suggested) and resolve the issues?

Thank you,
Sudheesh

> On Sep 29, 2015, at 6:00 PM, Sudheesh Katkam <sk...@maprtech.com> wrote:
> 
> My initial work  <https://github.com/sudheeshkatkam/drill/commit/7a5fa64828408842ab515734004433747a2b3ef0>on this brought down the execution time from 1203 seconds to ~20 seconds (most of this is planning time). As Jinfeng pointed out, the planning time can be reduced using the parquet metadata feature.
> 
> Now given the limitation that Venki pointed out, how do we optimize LIMIT n (where n > 0)? Is there a reason why we did not opt for the producer-consumer model?
> 
> Thank you,
> Sudheesh
> 
>> On Sep 25, 2015, at 11:35 AM, Venki Korukanti <venki.korukanti@gmail.com <ma...@gmail.com>> wrote:
>> 
>> One issue in moving RecordReader creation to setup is in chained
>> impersonation support. Fragment thread can be running within query user
>> doAs block, but the setup is in doAs block of the user (may not be the
>> query user) whom we want to impersonate when reading the underlying data.
>> May be we should move towards the producer-consumer mode where the scan
>> batch is always running in a separate thread that way we can lazily setup
>> readers and it runs within its own doAs block?
>> 
>> Thanks
>> Venki
>> 
>> On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <jacques@dremio.com <ma...@dremio.com>> wrote:
>> 
>>> Another thought: record batch tree creation time should be short. If any
>>> substantial work needs to be done, we should move it to setup.
>>> On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <jacques@dremio.com <ma...@dremio.com>> wrote:
>>> 
>>>> Limit zero shouldn't use any readers if we know the schema. Look at the
>>>> upstream constant reduction rule. We should be able to go straight from
>>>> calcite algebra to result without hitting any execution code. Think
>>> direct
>>>> response same as explain.
>>>> On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <jinfengni99@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>>> The query itself is quite simple; it normally should not take 60
>>>>> seconds for planning. I guess most of the planning time is spent on
>>>>> reading parquet metadata. The metadata caching that Steven worked
>>>>> should help in this case.
>>>>> 
>>>>> 
>>>>> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <skatkam@maprtech.com <ma...@maprtech.com>
>>>> 
>>>>> wrote:
>>>>>> For the table below, 33 seconds for execution (includes parquet reader
>>>>> initialization) and 60 seconds for planning.
>>>>>> 
>>>>>>> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <jinfengni99@gmail.com <ma...@gmail.com>>
>>>>> wrote:
>>>>>>> 
>>>>>>> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
>>>>>>> 
>>>>>>> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
>>>>>>> minutes of execution in hive case is spent on the initialization of
>>>>>>> Hive readers. If we want to improve "limit n", we probably should
>>> make
>>>>>>> "lazy" initialization of Hive reader; only when Drill has to read
>>> rows
>>>>>>> from reader, we do the initialization. Otherwise, to initialize all
>>>>>>> the readers before reading any single row means long setup time for
>>>>>>> limit "n" query, when n is relative small.
>>>>>>> 
>>>>>>> For the second case, the 94 seconds query time seems to be too long
>>> as
>>>>>>> well. I guess most of the time is spent on parquet reader
>>>>>>> initialization (?)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <
>>> skatkam@maprtech.com <ma...@maprtech.com>>
>>>>> wrote:
>>>>>>>> Hey y'all,
>>>>>>>> 
>>>>>>>> ### Short Question:
>>>>>>>> 
>>>>>>>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
>>>>>>>> 
>>>>>>>> ### Extended Question:
>>>>>>>> 
>>>>>>>> While investigating DRILL-3623 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3623 <https://issues.apache.org/jira/browse/DRILL-3623>>, I did an analysis to
>>>>> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>>>>>>>> 
>>>>>>>> ## Setup:
>>>>>>>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
>>>>> (MapR-FS in my case) directory named region. Create a Hive external
>>> table
>>>>> pointing to region. Run Drill with default configuration.
>>>>>>>> 
>>>>>>>> ## Now there are two ways to query this table:
>>>>>>>> 
>>>>>>>>> SELECT * FROM hive.region LIMIT 0;
>>>>>>>> +--------------+---------+------------+
>>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>>> +--------------+---------+------------+
>>>>>>>> +--------------+---------+------------+
>>>>>>>> No rows selected (1203.179 seconds)
>>>>>>>> ...
>>>>>>>> 
>>>>>>>>> SELECT * FROM dfs.test.region LIMIT 0;
>>>>>>>> +--------------+---------+------------+
>>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>>> +--------------+---------+------------+
>>>>>>>> +--------------+---------+------------+
>>>>>>>> No rows selected (94.396 seconds)
>>>>>>>> 
>>>>>>>> Currently, we use HiveRecordReader for the first case and
>>>>> ParquetRecordReader in the second case. With DRILL-3209 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3209 <https://issues.apache.org/jira/browse/DRILL-3209>>, both queries will
>>> use
>>>>> ParquetRecordReader. However, for formats that are non-native to Drill
>>> or
>>>>> other storage plugins, we still face this problem. Summarizing the query
>>>>> profile,
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> | Query | Fragments | Planning time | Execution time |
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> | hive  | 1         | ~2 min        | ~18 min        |
>>>>>>>> | dfs   | 1         | ~1 min        | ~33 sec        |
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> 
>>>>>>>> ## The time hogs:
>>>>>>>> 
>>>>>>>> # Planning time in both cases needs to improve. How?
>>>>>>>> 
>>>>>>>> # With respect to execution, in the first case
>>> ImplCreator.getExec(…)
>>>>> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch
>>> tree.
>>>>> There are 20,000 readers being initialized in HiveScanBatchCreator. How
>>> do
>>>>> we avoid this? What are the implications of chained impersonation
>>> (opening
>>>>> readers in ctor() rather than in setup())?
>>>>>>>> 
>>>>>>>> ### Extending further:
>>>>>>>> 
>>>>>>>> This can be generalized to any "LIMIT n" query with n is a small
>>>>> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster
>>>>> than LIMIT 0. However there is a sweet "n" after which parallelization
>>>>> hurts.
>>>>>>>> 
>>>>>>>> ###
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> Sudheesh
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Sudheesh Katkam <sk...@maprtech.com>.

My initial work  <https://github.com/sudheeshkatkam/drill/commit/7a5fa64828408842ab515734004433747a2b3ef0>on this brought down the execution time from 1203 seconds to ~20 seconds (most of this is planning time). As Jinfeng pointed out, the planning time can be reduced using the parquet metadata feature.

Now given the limitation that Venki pointed out, how do we optimize LIMIT n (where n > 0)? Is there a reason why we did not opt for the producer-consumer model?

Thank you,
Sudheesh

> On Sep 25, 2015, at 11:35 AM, Venki Korukanti <ve...@gmail.com> wrote:
> 
> One issue in moving RecordReader creation to setup is in chained
> impersonation support. Fragment thread can be running within query user
> doAs block, but the setup is in doAs block of the user (may not be the
> query user) whom we want to impersonate when reading the underlying data.
> May be we should move towards the producer-consumer mode where the scan
> batch is always running in a separate thread that way we can lazily setup
> readers and it runs within its own doAs block?
> 
> Thanks
> Venki
> 
> On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <ja...@dremio.com> wrote:
> 
>> Another thought: record batch tree creation time should be short. If any
>> substantial work needs to be done, we should move it to setup.
>> On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <ja...@dremio.com> wrote:
>> 
>>> Limit zero shouldn't use any readers if we know the schema. Look at the
>>> upstream constant reduction rule. We should be able to go straight from
>>> calcite algebra to result without hitting any execution code. Think
>> direct
>>> response same as explain.
>>> On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
>>> 
>>>> The query itself is quite simple; it normally should not take 60
>>>> seconds for planning. I guess most of the planning time is spent on
>>>> reading parquet metadata. The metadata caching that Steven worked
>>>> should help in this case.
>>>> 
>>>> 
>>>> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <skatkam@maprtech.com
>>> 
>>>> wrote:
>>>>> For the table below, 33 seconds for execution (includes parquet reader
>>>> initialization) and 60 seconds for planning.
>>>>> 
>>>>>> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
>>>>>> 
>>>>>> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
>>>>>> minutes of execution in hive case is spent on the initialization of
>>>>>> Hive readers. If we want to improve "limit n", we probably should
>> make
>>>>>> "lazy" initialization of Hive reader; only when Drill has to read
>> rows
>>>>>> from reader, we do the initialization. Otherwise, to initialize all
>>>>>> the readers before reading any single row means long setup time for
>>>>>> limit "n" query, when n is relative small.
>>>>>> 
>>>>>> For the second case, the 94 seconds query time seems to be too long
>> as
>>>>>> well. I guess most of the time is spent on parquet reader
>>>>>> initialization (?)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <
>> skatkam@maprtech.com>
>>>> wrote:
>>>>>>> Hey y'all,
>>>>>>> 
>>>>>>> ### Short Question:
>>>>>>> 
>>>>>>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
>>>>>>> 
>>>>>>> ### Extended Question:
>>>>>>> 
>>>>>>> While investigating DRILL-3623 <
>>>> https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to
>>>> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>>>>>>> 
>>>>>>> ## Setup:
>>>>>>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
>>>> (MapR-FS in my case) directory named region. Create a Hive external
>> table
>>>> pointing to region. Run Drill with default configuration.
>>>>>>> 
>>>>>>> ## Now there are two ways to query this table:
>>>>>>> 
>>>>>>>> SELECT * FROM hive.region LIMIT 0;
>>>>>>> +--------------+---------+------------+
>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>> +--------------+---------+------------+
>>>>>>> +--------------+---------+------------+
>>>>>>> No rows selected (1203.179 seconds)
>>>>>>> ...
>>>>>>> 
>>>>>>>> SELECT * FROM dfs.test.region LIMIT 0;
>>>>>>> +--------------+---------+------------+
>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>> +--------------+---------+------------+
>>>>>>> +--------------+---------+------------+
>>>>>>> No rows selected (94.396 seconds)
>>>>>>> 
>>>>>>> Currently, we use HiveRecordReader for the first case and
>>>> ParquetRecordReader in the second case. With DRILL-3209 <
>>>> https://issues.apache.org/jira/browse/DRILL-3209>, both queries will
>> use
>>>> ParquetRecordReader. However, for formats that are non-native to Drill
>> or
>>>> other storage plugins, we still face this problem. Summarizing the query
>>>> profile,
>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>> | Query | Fragments | Planning time | Execution time |
>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>> | hive  | 1         | ~2 min        | ~18 min        |
>>>>>>> | dfs   | 1         | ~1 min        | ~33 sec        |
>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>> 
>>>>>>> ## The time hogs:
>>>>>>> 
>>>>>>> # Planning time in both cases needs to improve. How?
>>>>>>> 
>>>>>>> # With respect to execution, in the first case
>> ImplCreator.getExec(…)
>>>> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch
>> tree.
>>>> There are 20,000 readers being initialized in HiveScanBatchCreator. How
>> do
>>>> we avoid this? What are the implications of chained impersonation
>> (opening
>>>> readers in ctor() rather than in setup())?
>>>>>>> 
>>>>>>> ### Extending further:
>>>>>>> 
>>>>>>> This can be generalized to any "LIMIT n" query with n is a small
>>>> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster
>>>> than LIMIT 0. However there is a sweet "n" after which parallelization
>>>> hurts.
>>>>>>> 
>>>>>>> ###
>>>>>>> 
>>>>>>> Thank you,
>>>>>>> Sudheesh
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Sudheesh Katkam <sk...@maprtech.com>.

See below.

> On Oct 3, 2015, at 5:24 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> 
> It doesn't seem like there is any reason to use a producer/consumer
> behavior to work around the doAs behavior. If we need to have a two stage
> setup (with two different contexts), let's just enhance the readers with
> this behavior.

Can you expand on how we’d use two different contexts?

Currently, with impersonation enabled:
1) Drill executes as a proxy user while creating the record batch tree which also includes initialization of record readers. 
2) And then, Drill executes as the query user while iterating over the record batch tree exhaustively.

> The producer/consumer was disabled because it didn't show a performance
> benefit. It also had termination issues (which was a nail in the coffin)
> but ultimately, it was designed for as a performance enhancement and
> utlimately stopped provided any benefit. It should probably be deleted.
> 
> I think we're mixing multiple things in this thread. I think this includes,
> at least:
> 
> - metadata response time
> - smart parallelization and metadata planning in the case of "small
> queries”

How about disabling exchanges for simple limit queries (and not just for limit 0)?

> - metadata response in the case of schemaed "zero queries"
> - better cancellation behavior (and execution time reporting) when
> interacting with user specific and general setup experiences.
> 
> If the goal is really "zero queries for hive", then we should just return
> the metadata from the metastore as a direct query. Building readers doesn't
> make any sense.

My initial solution for limit 0 was specific to Hive, but I’ll implement this way as a general solution for any schema-ed queries.

I am looking at "limit n” queries, and dealing with n = 0, and n > 0 as two different cases (against schema-ed and not, so four cases).

Thank you,
Sudheesh

> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Fri, Sep 25, 2015 at 11:35 AM, Venki Korukanti <venki.korukanti@gmail.com
>> wrote:
> 
>> One issue in moving RecordReader creation to setup is in chained
>> impersonation support. Fragment thread can be running within query user
>> doAs block, but the setup is in doAs block of the user (may not be the
>> query user) whom we want to impersonate when reading the underlying data.
>> May be we should move towards the producer-consumer mode where the scan
>> batch is always running in a separate thread that way we can lazily setup
>> readers and it runs within its own doAs block?
>> 
>> Thanks
>> Venki
>> 
>> On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>> 
>>> Another thought: record batch tree creation time should be short. If any
>>> substantial work needs to be done, we should move it to setup.
>>> On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <ja...@dremio.com> wrote:
>>> 
>>>> Limit zero shouldn't use any readers if we know the schema. Look at the
>>>> upstream constant reduction rule. We should be able to go straight from
>>>> calcite algebra to result without hitting any execution code. Think
>>> direct
>>>> response same as explain.
>>>> On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
>>>> 
>>>>> The query itself is quite simple; it normally should not take 60
>>>>> seconds for planning. I guess most of the planning time is spent on
>>>>> reading parquet metadata. The metadata caching that Steven worked
>>>>> should help in this case.
>>>>> 
>>>>> 
>>>>> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <
>> skatkam@maprtech.com
>>>> 
>>>>> wrote:
>>>>>> For the table below, 33 seconds for execution (includes parquet
>> reader
>>>>> initialization) and 60 seconds for planning.
>>>>>> 
>>>>>>> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
>>>>>>> 
>>>>>>> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
>>>>>>> minutes of execution in hive case is spent on the initialization of
>>>>>>> Hive readers. If we want to improve "limit n", we probably should
>>> make
>>>>>>> "lazy" initialization of Hive reader; only when Drill has to read
>>> rows
>>>>>>> from reader, we do the initialization. Otherwise, to initialize all
>>>>>>> the readers before reading any single row means long setup time for
>>>>>>> limit "n" query, when n is relative small.
>>>>>>> 
>>>>>>> For the second case, the 94 seconds query time seems to be too long
>>> as
>>>>>>> well. I guess most of the time is spent on parquet reader
>>>>>>> initialization (?)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <
>>> skatkam@maprtech.com>
>>>>> wrote:
>>>>>>>> Hey y'all,
>>>>>>>> 
>>>>>>>> ### Short Question:
>>>>>>>> 
>>>>>>>> How do we improve performance of SELECT * FROM plugin.table LIMIT
>> 0?
>>>>>>>> 
>>>>>>>> ### Extended Question:
>>>>>>>> 
>>>>>>>> While investigating DRILL-3623 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis
>> to
>>>>> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>>>>>>>> 
>>>>>>>> ## Setup:
>>>>>>>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
>>>>> (MapR-FS in my case) directory named region. Create a Hive external
>>> table
>>>>> pointing to region. Run Drill with default configuration.
>>>>>>>> 
>>>>>>>> ## Now there are two ways to query this table:
>>>>>>>> 
>>>>>>>>> SELECT * FROM hive.region LIMIT 0;
>>>>>>>> +--------------+---------+------------+
>>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>>> +--------------+---------+------------+
>>>>>>>> +--------------+---------+------------+
>>>>>>>> No rows selected (1203.179 seconds)
>>>>>>>> ...
>>>>>>>> 
>>>>>>>>> SELECT * FROM dfs.test.region LIMIT 0;
>>>>>>>> +--------------+---------+------------+
>>>>>>>> | r_regionkey  | r_name  | r_comment  |
>>>>>>>> +--------------+---------+------------+
>>>>>>>> +--------------+---------+------------+
>>>>>>>> No rows selected (94.396 seconds)
>>>>>>>> 
>>>>>>>> Currently, we use HiveRecordReader for the first case and
>>>>> ParquetRecordReader in the second case. With DRILL-3209 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3209>, both queries will
>>> use
>>>>> ParquetRecordReader. However, for formats that are non-native to Drill
>>> or
>>>>> other storage plugins, we still face this problem. Summarizing the
>> query
>>>>> profile,
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> | Query | Fragments | Planning time | Execution time |
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> | hive  | 1         | ~2 min        | ~18 min        |
>>>>>>>> | dfs   | 1         | ~1 min        | ~33 sec        |
>>>>>>>> +-------+-----------+---------------+----------------+
>>>>>>>> 
>>>>>>>> ## The time hogs:
>>>>>>>> 
>>>>>>>> # Planning time in both cases needs to improve. How?
>>>>>>>> 
>>>>>>>> # With respect to execution, in the first case
>>> ImplCreator.getExec(…)
>>>>> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch
>>> tree.
>>>>> There are 20,000 readers being initialized in HiveScanBatchCreator.
>> How
>>> do
>>>>> we avoid this? What are the implications of chained impersonation
>>> (opening
>>>>> readers in ctor() rather than in setup())?
>>>>>>>> 
>>>>>>>> ### Extending further:
>>>>>>>> 
>>>>>>>> This can be generalized to any "LIMIT n" query with n is a small
>>>>> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs
>> faster
>>>>> than LIMIT 0. However there is a sweet "n" after which parallelization
>>>>> hurts.
>>>>>>>> 
>>>>>>>> ###
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> Sudheesh
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Jacques Nadeau <ja...@dremio.com>.

It doesn't seem like there is any reason to use a producer/consumer
behavior to work around the doAs behavior. If we need to have a two stage
setup (with two different contexts), let's just enhance the readers with
this behavior.

The producer/consumer was disabled because it didn't show a performance
benefit. It also had termination issues (which was a nail in the coffin)
but ultimately, it was designed for as a performance enhancement and
utlimately stopped provided any benefit. It should probably be deleted.

I think we're mixing multiple things in this thread. I think this includes,
at least:

 - metadata response time
 - smart parallelization and metadata planning in the case of "small
queries"
 - metadata response in the case of schemaed "zero queries"
 - better cancellation behavior (and execution time reporting) when
interacting with user specific and general setup experiences.

If the goal is really "zero queries for hive", then we should just return
the metadata from the metastore as a direct query. Building readers doesn't
make any sense.


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Sep 25, 2015 at 11:35 AM, Venki Korukanti <venki.korukanti@gmail.com
> wrote:

> One issue in moving RecordReader creation to setup is in chained
> impersonation support. Fragment thread can be running within query user
> doAs block, but the setup is in doAs block of the user (may not be the
> query user) whom we want to impersonate when reading the underlying data.
> May be we should move towards the producer-consumer mode where the scan
> batch is always running in a separate thread that way we can lazily setup
> readers and it runs within its own doAs block?
>
> Thanks
> Venki
>
> On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > Another thought: record batch tree creation time should be short. If any
> > substantial work needs to be done, we should move it to setup.
> > On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <ja...@dremio.com> wrote:
> >
> > > Limit zero shouldn't use any readers if we know the schema. Look at the
> > > upstream constant reduction rule. We should be able to go straight from
> > > calcite algebra to result without hitting any execution code. Think
> > direct
> > > response same as explain.
> > > On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
> > >
> > >> The query itself is quite simple; it normally should not take 60
> > >> seconds for planning. I guess most of the planning time is spent on
> > >> reading parquet metadata. The metadata caching that Steven worked
> > >> should help in this case.
> > >>
> > >>
> > >> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <
> skatkam@maprtech.com
> > >
> > >> wrote:
> > >> > For the table below, 33 seconds for execution (includes parquet
> reader
> > >> initialization) and 60 seconds for planning.
> > >> >
> > >> >> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com>
> > >> wrote:
> > >> >>
> > >> >> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
> > >> >>
> > >> >> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
> > >> >> minutes of execution in hive case is spent on the initialization of
> > >> >> Hive readers. If we want to improve "limit n", we probably should
> > make
> > >> >> "lazy" initialization of Hive reader; only when Drill has to read
> > rows
> > >> >> from reader, we do the initialization. Otherwise, to initialize all
> > >> >> the readers before reading any single row means long setup time for
> > >> >> limit "n" query, when n is relative small.
> > >> >>
> > >> >> For the second case, the 94 seconds query time seems to be too long
> > as
> > >> >> well. I guess most of the time is spent on parquet reader
> > >> >> initialization (?)
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <
> > skatkam@maprtech.com>
> > >> wrote:
> > >> >>> Hey y'all,
> > >> >>>
> > >> >>> ### Short Question:
> > >> >>>
> > >> >>> How do we improve performance of SELECT * FROM plugin.table LIMIT
> 0?
> > >> >>>
> > >> >>> ### Extended Question:
> > >> >>>
> > >> >>> While investigating DRILL-3623 <
> > >> https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis
> to
> > >> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
> > >> >>>
> > >> >>> ## Setup:
> > >> >>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
> > >> (MapR-FS in my case) directory named region. Create a Hive external
> > table
> > >> pointing to region. Run Drill with default configuration.
> > >> >>>
> > >> >>> ## Now there are two ways to query this table:
> > >> >>>
> > >> >>>> SELECT * FROM hive.region LIMIT 0;
> > >> >>> +--------------+---------+------------+
> > >> >>> | r_regionkey  | r_name  | r_comment  |
> > >> >>> +--------------+---------+------------+
> > >> >>> +--------------+---------+------------+
> > >> >>> No rows selected (1203.179 seconds)
> > >> >>> ...
> > >> >>>
> > >> >>>> SELECT * FROM dfs.test.region LIMIT 0;
> > >> >>> +--------------+---------+------------+
> > >> >>> | r_regionkey  | r_name  | r_comment  |
> > >> >>> +--------------+---------+------------+
> > >> >>> +--------------+---------+------------+
> > >> >>> No rows selected (94.396 seconds)
> > >> >>>
> > >> >>> Currently, we use HiveRecordReader for the first case and
> > >> ParquetRecordReader in the second case. With DRILL-3209 <
> > >> https://issues.apache.org/jira/browse/DRILL-3209>, both queries will
> > use
> > >> ParquetRecordReader. However, for formats that are non-native to Drill
> > or
> > >> other storage plugins, we still face this problem. Summarizing the
> query
> > >> profile,
> > >> >>> +-------+-----------+---------------+----------------+
> > >> >>> | Query | Fragments | Planning time | Execution time |
> > >> >>> +-------+-----------+---------------+----------------+
> > >> >>> | hive  | 1         | ~2 min        | ~18 min        |
> > >> >>> | dfs   | 1         | ~1 min        | ~33 sec        |
> > >> >>> +-------+-----------+---------------+----------------+
> > >> >>>
> > >> >>> ## The time hogs:
> > >> >>>
> > >> >>> # Planning time in both cases needs to improve. How?
> > >> >>>
> > >> >>> # With respect to execution, in the first case
> > ImplCreator.getExec(…)
> > >> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch
> > tree.
> > >> There are 20,000 readers being initialized in HiveScanBatchCreator.
> How
> > do
> > >> we avoid this? What are the implications of chained impersonation
> > (opening
> > >> readers in ctor() rather than in setup())?
> > >> >>>
> > >> >>> ### Extending further:
> > >> >>>
> > >> >>> This can be generalized to any "LIMIT n" query with n is a small
> > >> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs
> faster
> > >> than LIMIT 0. However there is a sweet "n" after which parallelization
> > >> hurts.
> > >> >>>
> > >> >>> ###
> > >> >>>
> > >> >>> Thank you,
> > >> >>> Sudheesh
> > >> >>>
> > >> >
> > >>
> > >
> >
>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Venki Korukanti <ve...@gmail.com>.

One issue in moving RecordReader creation to setup is in chained
impersonation support. Fragment thread can be running within query user
doAs block, but the setup is in doAs block of the user (may not be the
query user) whom we want to impersonate when reading the underlying data.
May be we should move towards the producer-consumer mode where the scan
batch is always running in a separate thread that way we can lazily setup
readers and it runs within its own doAs block?

Thanks
Venki

On Fri, Sep 25, 2015 at 6:48 AM, Jacques Nadeau <ja...@dremio.com> wrote:

> Another thought: record batch tree creation time should be short. If any
> substantial work needs to be done, we should move it to setup.
> On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <ja...@dremio.com> wrote:
>
> > Limit zero shouldn't use any readers if we know the schema. Look at the
> > upstream constant reduction rule. We should be able to go straight from
> > calcite algebra to result without hitting any execution code. Think
> direct
> > response same as explain.
> > On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
> >
> >> The query itself is quite simple; it normally should not take 60
> >> seconds for planning. I guess most of the planning time is spent on
> >> reading parquet metadata. The metadata caching that Steven worked
> >> should help in this case.
> >>
> >>
> >> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <skatkam@maprtech.com
> >
> >> wrote:
> >> > For the table below, 33 seconds for execution (includes parquet reader
> >> initialization) and 60 seconds for planning.
> >> >
> >> >> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com>
> >> wrote:
> >> >>
> >> >> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
> >> >>
> >> >> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
> >> >> minutes of execution in hive case is spent on the initialization of
> >> >> Hive readers. If we want to improve "limit n", we probably should
> make
> >> >> "lazy" initialization of Hive reader; only when Drill has to read
> rows
> >> >> from reader, we do the initialization. Otherwise, to initialize all
> >> >> the readers before reading any single row means long setup time for
> >> >> limit "n" query, when n is relative small.
> >> >>
> >> >> For the second case, the 94 seconds query time seems to be too long
> as
> >> >> well. I guess most of the time is spent on parquet reader
> >> >> initialization (?)
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <
> skatkam@maprtech.com>
> >> wrote:
> >> >>> Hey y'all,
> >> >>>
> >> >>> ### Short Question:
> >> >>>
> >> >>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
> >> >>>
> >> >>> ### Extended Question:
> >> >>>
> >> >>> While investigating DRILL-3623 <
> >> https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to
> >> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
> >> >>>
> >> >>> ## Setup:
> >> >>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
> >> (MapR-FS in my case) directory named region. Create a Hive external
> table
> >> pointing to region. Run Drill with default configuration.
> >> >>>
> >> >>> ## Now there are two ways to query this table:
> >> >>>
> >> >>>> SELECT * FROM hive.region LIMIT 0;
> >> >>> +--------------+---------+------------+
> >> >>> | r_regionkey  | r_name  | r_comment  |
> >> >>> +--------------+---------+------------+
> >> >>> +--------------+---------+------------+
> >> >>> No rows selected (1203.179 seconds)
> >> >>> ...
> >> >>>
> >> >>>> SELECT * FROM dfs.test.region LIMIT 0;
> >> >>> +--------------+---------+------------+
> >> >>> | r_regionkey  | r_name  | r_comment  |
> >> >>> +--------------+---------+------------+
> >> >>> +--------------+---------+------------+
> >> >>> No rows selected (94.396 seconds)
> >> >>>
> >> >>> Currently, we use HiveRecordReader for the first case and
> >> ParquetRecordReader in the second case. With DRILL-3209 <
> >> https://issues.apache.org/jira/browse/DRILL-3209>, both queries will
> use
> >> ParquetRecordReader. However, for formats that are non-native to Drill
> or
> >> other storage plugins, we still face this problem. Summarizing the query
> >> profile,
> >> >>> +-------+-----------+---------------+----------------+
> >> >>> | Query | Fragments | Planning time | Execution time |
> >> >>> +-------+-----------+---------------+----------------+
> >> >>> | hive  | 1         | ~2 min        | ~18 min        |
> >> >>> | dfs   | 1         | ~1 min        | ~33 sec        |
> >> >>> +-------+-----------+---------------+----------------+
> >> >>>
> >> >>> ## The time hogs:
> >> >>>
> >> >>> # Planning time in both cases needs to improve. How?
> >> >>>
> >> >>> # With respect to execution, in the first case
> ImplCreator.getExec(…)
> >> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch
> tree.
> >> There are 20,000 readers being initialized in HiveScanBatchCreator. How
> do
> >> we avoid this? What are the implications of chained impersonation
> (opening
> >> readers in ctor() rather than in setup())?
> >> >>>
> >> >>> ### Extending further:
> >> >>>
> >> >>> This can be generalized to any "LIMIT n" query with n is a small
> >> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster
> >> than LIMIT 0. However there is a sweet "n" after which parallelization
> >> hurts.
> >> >>>
> >> >>> ###
> >> >>>
> >> >>> Thank you,
> >> >>> Sudheesh
> >> >>>
> >> >
> >>
> >
>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Jacques Nadeau <ja...@dremio.com>.

Another thought: record batch tree creation time should be short. If any
substantial work needs to be done, we should move it to setup.
On Sep 25, 2015 6:47 AM, "Jacques Nadeau" <ja...@dremio.com> wrote:

> Limit zero shouldn't use any readers if we know the schema. Look at the
> upstream constant reduction rule. We should be able to go straight from
> calcite algebra to result without hitting any execution code. Think direct
> response same as explain.
> On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:
>
>> The query itself is quite simple; it normally should not take 60
>> seconds for planning. I guess most of the planning time is spent on
>> reading parquet metadata. The metadata caching that Steven worked
>> should help in this case.
>>
>>
>> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <sk...@maprtech.com>
>> wrote:
>> > For the table below, 33 seconds for execution (includes parquet reader
>> initialization) and 60 seconds for planning.
>> >
>> >> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com>
>> wrote:
>> >>
>> >> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
>> >>
>> >> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
>> >> minutes of execution in hive case is spent on the initialization of
>> >> Hive readers. If we want to improve "limit n", we probably should make
>> >> "lazy" initialization of Hive reader; only when Drill has to read rows
>> >> from reader, we do the initialization. Otherwise, to initialize all
>> >> the readers before reading any single row means long setup time for
>> >> limit "n" query, when n is relative small.
>> >>
>> >> For the second case, the 94 seconds query time seems to be too long as
>> >> well. I guess most of the time is spent on parquet reader
>> >> initialization (?)
>> >>
>> >>
>> >>
>> >> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <sk...@maprtech.com>
>> wrote:
>> >>> Hey y'all,
>> >>>
>> >>> ### Short Question:
>> >>>
>> >>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
>> >>>
>> >>> ### Extended Question:
>> >>>
>> >>> While investigating DRILL-3623 <
>> https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to
>> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>> >>>
>> >>> ## Setup:
>> >>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
>> (MapR-FS in my case) directory named region. Create a Hive external table
>> pointing to region. Run Drill with default configuration.
>> >>>
>> >>> ## Now there are two ways to query this table:
>> >>>
>> >>>> SELECT * FROM hive.region LIMIT 0;
>> >>> +--------------+---------+------------+
>> >>> | r_regionkey  | r_name  | r_comment  |
>> >>> +--------------+---------+------------+
>> >>> +--------------+---------+------------+
>> >>> No rows selected (1203.179 seconds)
>> >>> ...
>> >>>
>> >>>> SELECT * FROM dfs.test.region LIMIT 0;
>> >>> +--------------+---------+------------+
>> >>> | r_regionkey  | r_name  | r_comment  |
>> >>> +--------------+---------+------------+
>> >>> +--------------+---------+------------+
>> >>> No rows selected (94.396 seconds)
>> >>>
>> >>> Currently, we use HiveRecordReader for the first case and
>> ParquetRecordReader in the second case. With DRILL-3209 <
>> https://issues.apache.org/jira/browse/DRILL-3209>, both queries will use
>> ParquetRecordReader. However, for formats that are non-native to Drill or
>> other storage plugins, we still face this problem. Summarizing the query
>> profile,
>> >>> +-------+-----------+---------------+----------------+
>> >>> | Query | Fragments | Planning time | Execution time |
>> >>> +-------+-----------+---------------+----------------+
>> >>> | hive  | 1         | ~2 min        | ~18 min        |
>> >>> | dfs   | 1         | ~1 min        | ~33 sec        |
>> >>> +-------+-----------+---------------+----------------+
>> >>>
>> >>> ## The time hogs:
>> >>>
>> >>> # Planning time in both cases needs to improve. How?
>> >>>
>> >>> # With respect to execution, in the first case ImplCreator.getExec(…)
>> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch tree.
>> There are 20,000 readers being initialized in HiveScanBatchCreator. How do
>> we avoid this? What are the implications of chained impersonation (opening
>> readers in ctor() rather than in setup())?
>> >>>
>> >>> ### Extending further:
>> >>>
>> >>> This can be generalized to any "LIMIT n" query with n is a small
>> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster
>> than LIMIT 0. However there is a sweet "n" after which parallelization
>> hurts.
>> >>>
>> >>> ###
>> >>>
>> >>> Thank you,
>> >>> Sudheesh
>> >>>
>> >
>>
>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Jacques Nadeau <ja...@dremio.com>.

Limit zero shouldn't use any readers if we know the schema. Look at the
upstream constant reduction rule. We should be able to go straight from
calcite algebra to result without hitting any execution code. Think direct
response same as explain.
On Sep 24, 2015 10:46 PM, "Jinfeng Ni" <ji...@gmail.com> wrote:

> The query itself is quite simple; it normally should not take 60
> seconds for planning. I guess most of the planning time is spent on
> reading parquet metadata. The metadata caching that Steven worked
> should help in this case.
>
>
> On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <sk...@maprtech.com>
> wrote:
> > For the table below, 33 seconds for execution (includes parquet reader
> initialization) and 60 seconds for planning.
> >
> >> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> >>
> >> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
> >>
> >> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
> >> minutes of execution in hive case is spent on the initialization of
> >> Hive readers. If we want to improve "limit n", we probably should make
> >> "lazy" initialization of Hive reader; only when Drill has to read rows
> >> from reader, we do the initialization. Otherwise, to initialize all
> >> the readers before reading any single row means long setup time for
> >> limit "n" query, when n is relative small.
> >>
> >> For the second case, the 94 seconds query time seems to be too long as
> >> well. I guess most of the time is spent on parquet reader
> >> initialization (?)
> >>
> >>
> >>
> >> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <sk...@maprtech.com>
> wrote:
> >>> Hey y'all,
> >>>
> >>> ### Short Question:
> >>>
> >>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
> >>>
> >>> ### Extended Question:
> >>>
> >>> While investigating DRILL-3623 <
> https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to
> see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
> >>>
> >>> ## Setup:
> >>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS
> (MapR-FS in my case) directory named region. Create a Hive external table
> pointing to region. Run Drill with default configuration.
> >>>
> >>> ## Now there are two ways to query this table:
> >>>
> >>>> SELECT * FROM hive.region LIMIT 0;
> >>> +--------------+---------+------------+
> >>> | r_regionkey  | r_name  | r_comment  |
> >>> +--------------+---------+------------+
> >>> +--------------+---------+------------+
> >>> No rows selected (1203.179 seconds)
> >>> ...
> >>>
> >>>> SELECT * FROM dfs.test.region LIMIT 0;
> >>> +--------------+---------+------------+
> >>> | r_regionkey  | r_name  | r_comment  |
> >>> +--------------+---------+------------+
> >>> +--------------+---------+------------+
> >>> No rows selected (94.396 seconds)
> >>>
> >>> Currently, we use HiveRecordReader for the first case and
> ParquetRecordReader in the second case. With DRILL-3209 <
> https://issues.apache.org/jira/browse/DRILL-3209>, both queries will use
> ParquetRecordReader. However, for formats that are non-native to Drill or
> other storage plugins, we still face this problem. Summarizing the query
> profile,
> >>> +-------+-----------+---------------+----------------+
> >>> | Query | Fragments | Planning time | Execution time |
> >>> +-------+-----------+---------------+----------------+
> >>> | hive  | 1         | ~2 min        | ~18 min        |
> >>> | dfs   | 1         | ~1 min        | ~33 sec        |
> >>> +-------+-----------+---------------+----------------+
> >>>
> >>> ## The time hogs:
> >>>
> >>> # Planning time in both cases needs to improve. How?
> >>>
> >>> # With respect to execution, in the first case ImplCreator.getExec(…)
> call in the FragmentExecutor took 1,070,926 ms to create RecordBatch tree.
> There are 20,000 readers being initialized in HiveScanBatchCreator. How do
> we avoid this? What are the implications of chained impersonation (opening
> readers in ctor() rather than in setup())?
> >>>
> >>> ### Extending further:
> >>>
> >>> This can be generalized to any "LIMIT n" query with n is a small
> number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster
> than LIMIT 0. However there is a sweet "n" after which parallelization
> hurts.
> >>>
> >>> ###
> >>>
> >>> Thank you,
> >>> Sudheesh
> >>>
> >
>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Jinfeng Ni <ji...@gmail.com>.

The query itself is quite simple; it normally should not take 60
seconds for planning. I guess most of the planning time is spent on
reading parquet metadata. The metadata caching that Steven worked
should help in this case.


On Thu, Sep 24, 2015 at 10:42 PM, Sudheesh Katkam <sk...@maprtech.com> wrote:
> For the table below, 33 seconds for execution (includes parquet reader initialization) and 60 seconds for planning.
>
>> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>>
>> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
>>
>> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
>> minutes of execution in hive case is spent on the initialization of
>> Hive readers. If we want to improve "limit n", we probably should make
>> "lazy" initialization of Hive reader; only when Drill has to read rows
>> from reader, we do the initialization. Otherwise, to initialize all
>> the readers before reading any single row means long setup time for
>> limit "n" query, when n is relative small.
>>
>> For the second case, the 94 seconds query time seems to be too long as
>> well. I guess most of the time is spent on parquet reader
>> initialization (?)
>>
>>
>>
>> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <sk...@maprtech.com> wrote:
>>> Hey y'all,
>>>
>>> ### Short Question:
>>>
>>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
>>>
>>> ### Extended Question:
>>>
>>> While investigating DRILL-3623 <https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>>>
>>> ## Setup:
>>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS (MapR-FS in my case) directory named region. Create a Hive external table pointing to region. Run Drill with default configuration.
>>>
>>> ## Now there are two ways to query this table:
>>>
>>>> SELECT * FROM hive.region LIMIT 0;
>>> +--------------+---------+------------+
>>> | r_regionkey  | r_name  | r_comment  |
>>> +--------------+---------+------------+
>>> +--------------+---------+------------+
>>> No rows selected (1203.179 seconds)
>>> ...
>>>
>>>> SELECT * FROM dfs.test.region LIMIT 0;
>>> +--------------+---------+------------+
>>> | r_regionkey  | r_name  | r_comment  |
>>> +--------------+---------+------------+
>>> +--------------+---------+------------+
>>> No rows selected (94.396 seconds)
>>>
>>> Currently, we use HiveRecordReader for the first case and ParquetRecordReader in the second case. With DRILL-3209 <https://issues.apache.org/jira/browse/DRILL-3209>, both queries will use ParquetRecordReader. However, for formats that are non-native to Drill or other storage plugins, we still face this problem. Summarizing the query profile,
>>> +-------+-----------+---------------+----------------+
>>> | Query | Fragments | Planning time | Execution time |
>>> +-------+-----------+---------------+----------------+
>>> | hive  | 1         | ~2 min        | ~18 min        |
>>> | dfs   | 1         | ~1 min        | ~33 sec        |
>>> +-------+-----------+---------------+----------------+
>>>
>>> ## The time hogs:
>>>
>>> # Planning time in both cases needs to improve. How?
>>>
>>> # With respect to execution, in the first case ImplCreator.getExec(…) call in the FragmentExecutor took 1,070,926 ms to create RecordBatch tree. There are 20,000 readers being initialized in HiveScanBatchCreator. How do we avoid this? What are the implications of chained impersonation (opening readers in ctor() rather than in setup())?
>>>
>>> ### Extending further:
>>>
>>> This can be generalized to any "LIMIT n" query with n is a small number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster than LIMIT 0. However there is a sweet "n" after which parallelization hurts.
>>>
>>> ###
>>>
>>> Thank you,
>>> Sudheesh
>>>
>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Sudheesh Katkam <sk...@maprtech.com>.

For the table below, 33 seconds for execution (includes parquet reader initialization) and 60 seconds for planning.

> On Sep 24, 2015, at 10:01 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> 
> "FragmentExecutor took 1,070,926 ms to create RecordBatch tree."
> 
> 1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
> minutes of execution in hive case is spent on the initialization of
> Hive readers. If we want to improve "limit n", we probably should make
> "lazy" initialization of Hive reader; only when Drill has to read rows
> from reader, we do the initialization. Otherwise, to initialize all
> the readers before reading any single row means long setup time for
> limit "n" query, when n is relative small.
> 
> For the second case, the 94 seconds query time seems to be too long as
> well. I guess most of the time is spent on parquet reader
> initialization (?)
> 
> 
> 
> On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <sk...@maprtech.com> wrote:
>> Hey y'all,
>> 
>> ### Short Question:
>> 
>> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
>> 
>> ### Extended Question:
>> 
>> While investigating DRILL-3623 <https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>> 
>> ## Setup:
>> Copy the drill/sample-data/region.parquet (x 20000) into a DFS (MapR-FS in my case) directory named region. Create a Hive external table pointing to region. Run Drill with default configuration.
>> 
>> ## Now there are two ways to query this table:
>> 
>>> SELECT * FROM hive.region LIMIT 0;
>> +--------------+---------+------------+
>> | r_regionkey  | r_name  | r_comment  |
>> +--------------+---------+------------+
>> +--------------+---------+------------+
>> No rows selected (1203.179 seconds)
>> ...
>> 
>>> SELECT * FROM dfs.test.region LIMIT 0;
>> +--------------+---------+------------+
>> | r_regionkey  | r_name  | r_comment  |
>> +--------------+---------+------------+
>> +--------------+---------+------------+
>> No rows selected (94.396 seconds)
>> 
>> Currently, we use HiveRecordReader for the first case and ParquetRecordReader in the second case. With DRILL-3209 <https://issues.apache.org/jira/browse/DRILL-3209>, both queries will use ParquetRecordReader. However, for formats that are non-native to Drill or other storage plugins, we still face this problem. Summarizing the query profile,
>> +-------+-----------+---------------+----------------+
>> | Query | Fragments | Planning time | Execution time |
>> +-------+-----------+---------------+----------------+
>> | hive  | 1         | ~2 min        | ~18 min        |
>> | dfs   | 1         | ~1 min        | ~33 sec        |
>> +-------+-----------+---------------+----------------+
>> 
>> ## The time hogs:
>> 
>> # Planning time in both cases needs to improve. How?
>> 
>> # With respect to execution, in the first case ImplCreator.getExec(…) call in the FragmentExecutor took 1,070,926 ms to create RecordBatch tree. There are 20,000 readers being initialized in HiveScanBatchCreator. How do we avoid this? What are the implications of chained impersonation (opening readers in ctor() rather than in setup())?
>> 
>> ### Extending further:
>> 
>> This can be generalized to any "LIMIT n" query with n is a small number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster than LIMIT 0. However there is a sweet "n" after which parallelization hurts.
>> 
>> ###
>> 
>> Thank you,
>> Sudheesh
>>

Re: Improving Performance of SELECT * FROM hive.table LIMIT 0

Posted by Jinfeng Ni <ji...@gmail.com>.

"FragmentExecutor took 1,070,926 ms to create RecordBatch tree."

1,070,926 ms ~ 17.x  minutes. In other words, the majority of 18
minutes of execution in hive case is spent on the initialization of
Hive readers. If we want to improve "limit n", we probably should make
"lazy" initialization of Hive reader; only when Drill has to read rows
from reader, we do the initialization. Otherwise, to initialize all
the readers before reading any single row means long setup time for
limit "n" query, when n is relative small.

For the second case, the 94 seconds query time seems to be too long as
well. I guess most of the time is spent on parquet reader
initialization (?)



On Thu, Sep 24, 2015 at 9:32 PM, Sudheesh Katkam <sk...@maprtech.com> wrote:
> Hey y'all,
>
> ### Short Question:
>
> How do we improve performance of SELECT * FROM plugin.table LIMIT 0?
>
> ### Extended Question:
>
> While investigating DRILL-3623 <https://issues.apache.org/jira/browse/DRILL-3623>, I did an analysis to see where we spend time for SELECT * FROM hive.table LIMIT 0 query.
>
> ## Setup:
> Copy the drill/sample-data/region.parquet (x 20000) into a DFS (MapR-FS in my case) directory named region. Create a Hive external table pointing to region. Run Drill with default configuration.
>
> ## Now there are two ways to query this table:
>
>> SELECT * FROM hive.region LIMIT 0;
> +--------------+---------+------------+
> | r_regionkey  | r_name  | r_comment  |
> +--------------+---------+------------+
> +--------------+---------+------------+
> No rows selected (1203.179 seconds)
> ...
>
>> SELECT * FROM dfs.test.region LIMIT 0;
> +--------------+---------+------------+
> | r_regionkey  | r_name  | r_comment  |
> +--------------+---------+------------+
> +--------------+---------+------------+
> No rows selected (94.396 seconds)
>
> Currently, we use HiveRecordReader for the first case and ParquetRecordReader in the second case. With DRILL-3209 <https://issues.apache.org/jira/browse/DRILL-3209>, both queries will use ParquetRecordReader. However, for formats that are non-native to Drill or other storage plugins, we still face this problem. Summarizing the query profile,
> +-------+-----------+---------------+----------------+
> | Query | Fragments | Planning time | Execution time |
> +-------+-----------+---------------+----------------+
> | hive  | 1         | ~2 min        | ~18 min        |
> | dfs   | 1         | ~1 min        | ~33 sec        |
> +-------+-----------+---------------+----------------+
>
> ## The time hogs:
>
> # Planning time in both cases needs to improve. How?
>
> # With respect to execution, in the first case ImplCreator.getExec(…) call in the FragmentExecutor took 1,070,926 ms to create RecordBatch tree. There are 20,000 readers being initialized in HiveScanBatchCreator. How do we avoid this? What are the implications of chained impersonation (opening readers in ctor() rather than in setup())?
>
> ### Extending further:
>
> This can be generalized to any "LIMIT n" query with n is a small number. For n > 0, we parallelize scanning. So LIMIT 1 query runs faster than LIMIT 0. However there is a sweet "n" after which parallelization hurts.
>
> ###
>
> Thank you,
> Sudheesh
>