You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Sandaruwan Kumarasingha <sa...@cse.mrt.ac.lk> on 2022/05/13 12:45:51 UTC

Requesting Solutions to Improve Impala Performance with a huge Kudu Data Load

Hi Team,


Our team is working on a huge data load in Kudu and we are currently facing
a performance issue . We are hoping you can guide us on a solution to the
below mentioned concerns.


We have 212 million data loads in Kudu. Currently for such a data load,
when loading through impala, 47 seconds are spent for query processing and
loading overall. We have used default configurations in Kudu and Impala
with 6 node clusters to get these numbers.We haven’t reached the
performance we expected.

I have attached the impala profile and  DDL of the table creation. We have
used  impala-3.4.0 and  kudu-1.15.0 versions.


*What can we do to reduce the time spent for loading 212 million data loads
from 47 seconds to 10 seconds through impala?*


We would be grateful if you can provide us with some solutions at
your earliest possible.


Thank You! Regards,

Sandaruwan Kumarasingha.