You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by "Rosenthaler Matthias (PS-DI/ETF1.1)" <Ma...@at.bosch.com> on 2018/11/29 12:56:20 UTC
Drill performance - Waiting time
Hi,
I am using apache drill to query huge parquet files (100 MB) on a single node.
A SELECT * query takes around 90 Seconds. 80 Seconds are "Waiting time".
Can you explain what this waiting time means and how I am able to optimize it?
Mit freundlichen Grüßen / Best regards
Matthias Rosenthaler
Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at<http://www.bosch.at>
Tel. +43 732 7667-479 | Matthias.Rosenthaler@at.bosch.com<ma...@at.bosch.com>
Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , Registergericht: FN 55722 w HG-Wien
Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter Fouquet
DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 140/4988
Re: Drill performance - Waiting time
Posted by Ted Dunning <te...@gmail.com>.
Matthias,
Kunal gives very good information about how to start from the high level to
debug this, but you should also be suspicious of the lower levels. For
instance, are you sure that your file system is working correctly? Is the
file actually stored on MapR?
How long does it take to run something like wc on this file on the same
node?
This should take much less than a second if you have competent I/O system,
but it would not be unheard of to hear that this is much slower than
expected due any number of reasons.
On Thu, Nov 29, 2018 at 9:13 AM Rosenthaler Matthias (PS-DI/ETF1.1) <
Matthias.Rosenthaler@at.bosch.com> wrote:
> Hi,
>
> I am using apache drill to query huge parquet files (100 MB) on a single
> node.
> A SELECT * query takes around 90 Seconds. 80 Seconds are "Waiting time".
> Can you explain what this waiting time means and how I am able to optimize
> it?
>
> Mit freundlichen Grüßen / Best regards
>
> Matthias Rosenthaler
>
> Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
> Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA |
> www.bosch.at<http://www.bosch.at>
> Tel. +43 732 7667-479 | Matthias.Rosenthaler@at.bosch.com<mailto:
> Matthias.Rosenthaler@at.bosch.com>
>
> Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 ,
> Registergericht: FN 55722 w HG-Wien
> Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus
> Peter Fouquet
> DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 -
> Steuernummer 140/4988
>
>
>
Re: Drill performance - Waiting time
Posted by Kunal Khatua <ku...@apache.org>.
Hi Matthias
The waiting time for a PARQUET_ROW_GROUP_SCAN operator is the total time that all the fragments took to read the parquet data into memory as Drill's Value Vectors. So, 80 seconds would indicate that the bulk of the time is spent in just getting the data.
If you scroll down to the operator specific table.. you;ll find an entry on the lines of
09-xx-01 - PARQUET_ROW_GROUP_SCAN
Within that collapsed table, you should find at the end a sub section for Operator Metrics.
These metrics should be able to tell you where time is being spent the most on a per-fragment level.
If the metrics are missing, that means the traditional Parquet reader was used instead of Drill's fast native parquet reader (Drill does this if it encounters parquet files with Nested data) and the time is being spent by the Parquet library in deserializing the file. In this case, you're out of luck and your best bet is to split the parquet file into multiple files or atleast multiple rowgroups. That way, Drill can create more fragments (assuming you've not maxed out that limit) and read the data in parallel.
~ Kunal
On 11/29/2018 9:13:45 AM, Rosenthaler Matthias (PS-DI/ETF1.1) <ma...@at.bosch.com> wrote:
Hi,
I am using apache drill to query huge parquet files (100 MB) on a single node.
A SELECT * query takes around 90 Seconds. 80 Seconds are "Waiting time".
Can you explain what this waiting time means and how I am able to optimize it?
Mit freundlichen Grüßen / Best regards
Matthias Rosenthaler
Powertrain Solutions, Engine Testing (PS-DI/ETF1.1)
Robert Bosch AG | Robert-Bosch-Straße 1 | 4020 Linz | AUSTRIA | www.bosch.at
Tel. +43 732 7667-479 | Matthias.Rosenthaler@at.bosch.com
Sitz: Robert Bosch Aktiengesellschaft, A-1030 Wien, Göllnergasse 15-17 , Registergericht: FN 55722 w HG-Wien
Aufsichtsratsvorsitzender: Dr. Uwe Thomas; Geschäftsführung: Dr. Klaus Peter Fouquet
DVR-Nr.: 0418871- ARA-Lizenz-Nr.: 1831 - UID-Nr.: ATU14719303 - Steuernummer 140/4988