You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by "Rosenthaler Matthias (PS-DI/ETF1.1)" <Ma...@at.bosch.com> on 2018/11/29 12:56:20 UTC

Drill performance - Waiting time

Hi,

I am using apache drill to query huge parquet files (100 MB) on a single node.
A SELECT * query takes around 90 Seconds. 80 Seconds are "Waiting time".
Can you explain what this waiting time means and how I am able to optimize it?

Mit freundlichen Grüßen / Best regards

Matthias Rosenthaler

Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at<http://www.bosch.at>
Tel. +43 732 7667-479 | Matthias.Rosenthaler@at.bosch.com<ma...@at.bosch.com>

Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , Registergericht: FN 55722 w HG-Wien
Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter Fouquet
DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 140/4988



Re: Drill performance - Waiting time

Posted by Ted Dunning <te...@gmail.com>.
Matthias,

Kunal gives very good information about how to start from the high level to
debug this, but you should also be suspicious of the lower levels.  For
instance, are you sure that your file system is working correctly? Is the
file actually stored on MapR?

How long does it take to run something like wc on this file on the same
node?

This should take much less than a second if you have competent I/O system,
but it would not be unheard of to hear that this is much slower than
expected due any number of reasons.



On Thu, Nov 29, 2018 at 9:13 AM Rosenthaler Matthias (PS-DI/ETF1.1) <
Matthias.Rosenthaler@at.bosch.com> wrote:

> Hi,
>
> I am using apache drill to query huge parquet files (100 MB) on a single
> node.
> A SELECT * query takes around 90 Seconds. 80 Seconds are "Waiting time".
> Can you explain what this waiting time means and how I am able to optimize
> it?
>
> Mit freundlichen Grüßen / Best regards
>
> Matthias Rosenthaler
>
> Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
> Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA |
> www.bosch.at<http://www.bosch.at>
> Tel. +43 732 7667-479 | Matthias.Rosenthaler@at.bosch.com<mailto:
> Matthias.Rosenthaler@at.bosch.com>
>
> Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 ,
> Registergericht: FN 55722 w HG-Wien
> Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus
> Peter Fouquet
> DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 -
> Steuernummer 140/4988
>
>
>

Re: Drill performance - Waiting time

Posted by Kunal Khatua <ku...@apache.org>.
Hi Matthias

The waiting time for a PARQUET_ROW_GROUP_SCAN operator is the total time that all the fragments took to read the parquet data into memory as Drill's Value Vectors. So, 80 seconds would indicate that the bulk of the time is spent in just getting the data. 

If you scroll down to the operator specific table.. you;ll find an entry on the lines of 
09-xx-01 - PARQUET_ROW_GROUP_SCAN

Within that collapsed table, you should find at the end a sub section for Operator Metrics. 

These metrics should be able to tell you where time is being spent the most on a per-fragment level. 

If the metrics are missing, that means the traditional Parquet reader was used instead of Drill's fast native parquet reader (Drill does this if it encounters parquet files with Nested data) and the time is being spent by the Parquet library in deserializing the file. In this case, you're out of luck and your best bet is to split the parquet file into multiple files or atleast multiple rowgroups. That way, Drill can create more fragments (assuming you've not maxed out that limit) and read the data in parallel.

~ Kunal 
On 11/29/2018 9:13:45 AM, Rosenthaler Matthias (PS-DI/ETF1.1) <ma...@at.bosch.com> wrote:
Hi,

I am using apache drill to query huge parquet files (100 MB) on a single node.
A SELECT * query takes around 90 Seconds. 80 Seconds are "Waiting time".
Can you explain what this waiting time means and how I am able to optimize it?

Mit freundlichen Grüßen / Best regards

Matthias Rosenthaler

Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at
Tel. +43 732 7667-479 | Matthias.Rosenthaler@at.bosch.com

Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , Registergericht: FN 55722 w HG-Wien
Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter Fouquet
DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 140/4988