You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@impala.apache.org by Andrey Kuznetsov <An...@epam.com> on 2017/09/05 15:12:01 UTC

[Impala] Performance strange behavior

Hi folk,
Need you experience.
I conduct performance testing of Impala+Parquet on 3,6, and 8 data nodes. Throughput is presented below for each configuration:

[cid:image002.jpg@01D32662.7E01BD10]


1.       I am wonder why throughput for 1+8 for threads >100 less then throughput for 1+6. Do you know why it happens?

2.       Do you know how we can explain throughput degradation after threads > 80? Threads concurrency?

Settings (755Gb RAM per host, 70 cores per host, 10Gbit/sec network) is the same for all configurations, impala daemons run on each data node with 500Gb memory limit, there are no queues, there are no any bottleneck in resources (CPU/disk/net/RAM, plots are attached).

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader

Re: [Impala] Performance strange behavior

Posted by Mostafa Mokhtar <mm...@cloudera.com>.

Hi Andrey,

Can you please share some of the query profiles for us to analyze?
When running queries with a large number of scanner threads you are likely
to hit IMPALA-5302  and IMPALA-4923 which have been fixed in Impala 2.9.

Also can you run "sudo perf top" for 30 seconds or so then share a print
screen?

Thanks
Mostafa

---------- Forwarded message ----------
> From: Andrey Kuznetsov <An...@epam.com>
> Date: Tue, Sep 5, 2017 at 8:12 AM
> Subject: [Impala] Performance strange behavior
> To: "user@impala.incubator.apache.org" <us...@impala.incubator.apache.org>
> Cc: Special SBER-BPOC Team <Sp...@epam.com>
>
>
> Hi folk,
>
> Need you experience.
>
> I conduct performance testing of Impala+Parquet on 3,6, and 8 data nodes.
> Throughput is presented below for each configuration:
>
>
>
> [image: cid:image002.jpg@01D32662.7E01BD10]
>
>
>
> 1.       I am wonder why throughput for 1+8 for threads >100 less then
> throughput for 1+6. Do you know why it happens?
>
> 2.       Do you know how we can explain throughput degradation after
> threads > 80? Threads concurrency?
>
>
>
> Settings (755Gb RAM per host, 70 cores per host, 10Gbit/sec network) is
> the same for all configurations, impala daemons run on each data node with
> 500Gb memory limit, there are no queues, there are no any bottleneck in
> resources (CPU/disk/net/RAM, plots are attached).
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader*
>
>
>
>