You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kudu.apache.org by Boris Tyukin <bo...@boristyukin.com> on 2018/02/22 14:31:20 UTC

Impala Parquet to Kudu 1.5 - severe ingest performance degradation

Hello,

we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1 and
noticed quite severe performance degradation. We did CTAS from Impala
parquet table which has not changed a bit since the upgrade (even the same
# of rows) to Kudu using the follow query below.

It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.

Of course Impala version was also bumped with CDH 5.13.

Any clue why it takes so much time now?

Table has 5.5B rows..

create TABLE kudutest_ts.clinical_event_nots

PRIMARY KEY (clinical_event_id)

PARTITION BY HASH(clinical_event_id) PARTITIONS 120

STORED AS KUDU

AS

SELECT

  clinical_event_id,

  encntr_id,

  person_id,

  encntr_financial_id,

  event_id,

  event_title_text,

  CAST(view_level as string) as view_level,

  order_id,

  catalog_cd,

  series_ref_nbr,

  accession_nbr,

  contributor_system_cd,

  reference_nbr,

  parent_event_id,

  event_reltn_cd,

  event_class_cd,

  event_cd,

  event_tag,

  CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,

  result_val,

  result_units_cd,

  result_time_units_cd,

  task_assay_cd,

  record_status_cd,

  result_status_cd,

  CAST(authentic_flag as STRING) authentic_flag,

  CAST(publish_flag as STRING) publish_flag,

  qc_review_cd,

  normalcy_cd,

  normalcy_method_cd,

  inquire_security_cd,

  resource_group_cd,

  resource_cd,

  CAST(subtable_bit_map as STRING) subtable_bit_map,

  collating_seq,

  verified_prsnl_id,

  performed_prsnl_id,

  updt_id,

  CAST(updt_task as STRING) updt_task,

  updt_cnt,

  CAST(updt_applctx as STRING) updt_applctx,

  normal_low,

  normal_high,

  critical_low,

  critical_high,

  CAST(event_tag_set_flag as STRING) event_tag_set_flag,

  CAST(note_importance_bit_map as STRING) note_importance_bit_map,

  CAST(order_action_sequence as STRING) order_action_sequence,

  entry_mode_cd,

  source_cd,

  clinical_seq,

  CAST(event_end_tz as STRING) event_end_tz,

  CAST(event_start_tz as STRING) event_start_tz,

  CAST(performed_tz as STRING) performed_tz,

  CAST(verified_tz as STRING) verified_tz,

  task_assay_version_nbr,

  modifier_long_text_id,

  ce_dynamic_label_id,

  CAST(nomen_string_flag as STRING) nomen_string_flag,

  src_event_id,

  CAST(last_utc_ts as BIGINT) last_utc_ts,

  device_free_txt,

  CAST(trait_bit_map as STRING) trait_bit_map,

  CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,

  CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,

  CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,

  CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,

  CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,

  CAST(verified_dt_tm as BIGINT) verified_dt_tm,

  CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,

  CAST(updt_dt_tm as BIGINT) updt_dt_tm,

  CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,

  CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,

  CAST(performed_dt_tm as BIGINT) performed_dt_tm,

  txn_id_text,

  CAST(ingest_dt_tm as BIGINT) ingest_dt_tm

FROM v500.clinical_event

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

Posted by Boris Tyukin <bo...@boristyukin.com>.

point taken! thanks Todd! you saved us from days of frustration, so really
thankful for that!

Boris

On Wed, Feb 28, 2018 at 2:07 PM, Todd Lipcon <to...@apache.org> wrote:

> Hi Boris,
>
> Thanks for the feedback and sharing your experience. Like you said, this
> is more of an issue with downstream documentation so it's probably not
> appropriate to continue discussing this in the context of the Apache Kudu
> open source project. For feedback on CDH it's best to direct that towards
> the associated vendor (Cloudera). Even though I happen to be employed by
> them, I and other employees participate here in the Apache Kudu project as
> individuals and it's important to keep the distinction separate. Kudu is a
> product of the ASF non-profit organization, not a product of any commercial
> vendor.
>
> -Todd
>
>
> On Wed, Feb 28, 2018 at 6:17 AM, Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> first of all, thanks for a very quick response. It means a lot to know
>> you guys stand behind Kudu and work with users like myself. In fact, we get
>> a better support here than through the official channels :)
>>
>> That said, I needed to vent a bit :) Granted this should be probably
>> directed at Impala devs but I think Kudu has been impacted the most.
>>
>> we ran a cluster health check command and saw some warning in there. But
>> we proceeded with running insert statement as Todd suggested with hints and
>> now we are back to our time we've used to get with Kudu 1.3 / Impala 2.8
>>
>> it is a bit frustrating I must say that these changes impacted
>> performance so dramatically. In my opinion, Kudu and CDH release notes must
>> have stuff like that in BOLD RED colors so it does not catch users by
>> surprise. And not everyone so organized like us - we knew exactly how much
>> time it took with Kudu 1.3 /Impala 2.8.
>>
>> I've tracked down all the related JIRAs and I did not see a word about a
>> dramatic performance degradation. I did see words like 'optimized' and
>> 'improved'.
>>
>> Since it is part of the official CDH distro, I would expect a little bit
>> more proactive warning.
>>
>> If you want to open a JIRA on this, would be happy to do it...We see this
>> degradation all across our tables and as you can see from my example query,
>> it is a really straight select from a table - no joins, no predicates and
>> no complex calculations.
>>
>> Thanks again,
>> Boris
>>
>> On Thu, Feb 22, 2018 at 2:44 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>>> In addition to what Hao suggests, I think it's worth noting that the
>>> insert query plan created by Impala changed a bit over time.
>>>
>>> It sounds like you upgraded from Impala 2.8 (in CDH 5.11), which used a
>>> very straightforward insert plan - each node separately inserted rows in
>>> whatever order the rows were consumed. This plan worked well for smaller
>>> inserts but could cause timeouts with larger workloads.
>>>
>>> In Impala 2.9, the plan was changed so that Impala performs some
>>> shuffling and sorting before inserting into Kudu. This makes the Kudu
>>> insert pattern more reliable and efficient, but could cause a degradation
>>> for some workloads since Impala's sorts are single-threaded.
>>>
>>> Impala 2.10 (which I guess you are running) improved a bit over 2.9 in
>>> ensuring that the sorts can be "partial" which resolved some of the
>>> performance degradation, but it's possible your workload is still affected
>>> negatively.
>>>
>>> To disable the new behavior you can use the insert hints 'noshuffle'
>>> and/or 'noclustered', such as:
>>>
>>> upsert into my_table /* +noclustered,noshuffle */ select * from
>>> my_other_table;
>>>
>>>
>>> Hope that helps
>>> -Todd
>>>
>>> On Thu, Feb 22, 2018 at 11:02 AM, Hao Hao <ha...@cloudera.com> wrote:
>>>
>>>> Did you happen to check the health of the cluster after the upgrade by 'kudu
>>>> cluster ksck'?
>>>>
>>>> Best,
>>>> Hao
>>>>
>>>> On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <bo...@boristyukin.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> we just upgraded our dev cluster from Kudu 1.3 to kudu
>>>>> 1.5.0-cdh5.13.1 and noticed quite severe performance degradation. We did
>>>>> CTAS from Impala parquet table which has not changed a bit since the
>>>>> upgrade (even the same # of rows) to Kudu using the follow query below.
>>>>>
>>>>> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.
>>>>>
>>>>> Of course Impala version was also bumped with CDH 5.13.
>>>>>
>>>>> Any clue why it takes so much time now?
>>>>>
>>>>> Table has 5.5B rows..
>>>>>
>>>>> create TABLE kudutest_ts.clinical_event_nots
>>>>>
>>>>> PRIMARY KEY (clinical_event_id)
>>>>>
>>>>> PARTITION BY HASH(clinical_event_id) PARTITIONS 120
>>>>>
>>>>> STORED AS KUDU
>>>>>
>>>>> AS
>>>>>
>>>>> SELECT
>>>>>
>>>>>   clinical_event_id,
>>>>>
>>>>>   encntr_id,
>>>>>
>>>>>   person_id,
>>>>>
>>>>>   encntr_financial_id,
>>>>>
>>>>>   event_id,
>>>>>
>>>>>   event_title_text,
>>>>>
>>>>>   CAST(view_level as string) as view_level,
>>>>>
>>>>>   order_id,
>>>>>
>>>>>   catalog_cd,
>>>>>
>>>>>   series_ref_nbr,
>>>>>
>>>>>   accession_nbr,
>>>>>
>>>>>   contributor_system_cd,
>>>>>
>>>>>   reference_nbr,
>>>>>
>>>>>   parent_event_id,
>>>>>
>>>>>   event_reltn_cd,
>>>>>
>>>>>   event_class_cd,
>>>>>
>>>>>   event_cd,
>>>>>
>>>>>   event_tag,
>>>>>
>>>>>   CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,
>>>>>
>>>>>   result_val,
>>>>>
>>>>>   result_units_cd,
>>>>>
>>>>>   result_time_units_cd,
>>>>>
>>>>>   task_assay_cd,
>>>>>
>>>>>   record_status_cd,
>>>>>
>>>>>   result_status_cd,
>>>>>
>>>>>   CAST(authentic_flag as STRING) authentic_flag,
>>>>>
>>>>>   CAST(publish_flag as STRING) publish_flag,
>>>>>
>>>>>   qc_review_cd,
>>>>>
>>>>>   normalcy_cd,
>>>>>
>>>>>   normalcy_method_cd,
>>>>>
>>>>>   inquire_security_cd,
>>>>>
>>>>>   resource_group_cd,
>>>>>
>>>>>   resource_cd,
>>>>>
>>>>>   CAST(subtable_bit_map as STRING) subtable_bit_map,
>>>>>
>>>>>   collating_seq,
>>>>>
>>>>>   verified_prsnl_id,
>>>>>
>>>>>   performed_prsnl_id,
>>>>>
>>>>>   updt_id,
>>>>>
>>>>>   CAST(updt_task as STRING) updt_task,
>>>>>
>>>>>   updt_cnt,
>>>>>
>>>>>   CAST(updt_applctx as STRING) updt_applctx,
>>>>>
>>>>>   normal_low,
>>>>>
>>>>>   normal_high,
>>>>>
>>>>>   critical_low,
>>>>>
>>>>>   critical_high,
>>>>>
>>>>>   CAST(event_tag_set_flag as STRING) event_tag_set_flag,
>>>>>
>>>>>   CAST(note_importance_bit_map as STRING) note_importance_bit_map,
>>>>>
>>>>>   CAST(order_action_sequence as STRING) order_action_sequence,
>>>>>
>>>>>   entry_mode_cd,
>>>>>
>>>>>   source_cd,
>>>>>
>>>>>   clinical_seq,
>>>>>
>>>>>   CAST(event_end_tz as STRING) event_end_tz,
>>>>>
>>>>>   CAST(event_start_tz as STRING) event_start_tz,
>>>>>
>>>>>   CAST(performed_tz as STRING) performed_tz,
>>>>>
>>>>>   CAST(verified_tz as STRING) verified_tz,
>>>>>
>>>>>   task_assay_version_nbr,
>>>>>
>>>>>   modifier_long_text_id,
>>>>>
>>>>>   ce_dynamic_label_id,
>>>>>
>>>>>   CAST(nomen_string_flag as STRING) nomen_string_flag,
>>>>>
>>>>>   src_event_id,
>>>>>
>>>>>   CAST(last_utc_ts as BIGINT) last_utc_ts,
>>>>>
>>>>>   device_free_txt,
>>>>>
>>>>>   CAST(trait_bit_map as STRING) trait_bit_map,
>>>>>
>>>>>   CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,
>>>>>
>>>>>   CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,
>>>>>
>>>>>   CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,
>>>>>
>>>>>   CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,
>>>>>
>>>>>   CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,
>>>>>
>>>>>   CAST(verified_dt_tm as BIGINT) verified_dt_tm,
>>>>>
>>>>>   CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,
>>>>>
>>>>>   CAST(updt_dt_tm as BIGINT) updt_dt_tm,
>>>>>
>>>>>   CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,
>>>>>
>>>>>   CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,
>>>>>
>>>>>   CAST(performed_dt_tm as BIGINT) performed_dt_tm,
>>>>>
>>>>>   txn_id_text,
>>>>>
>>>>>   CAST(ingest_dt_tm as BIGINT) ingest_dt_tm
>>>>>
>>>>> FROM v500.clinical_event
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

Posted by Todd Lipcon <to...@apache.org>.

Hi Boris,

Thanks for the feedback and sharing your experience. Like you said, this is
more of an issue with downstream documentation so it's probably not
appropriate to continue discussing this in the context of the Apache Kudu
open source project. For feedback on CDH it's best to direct that towards
the associated vendor (Cloudera). Even though I happen to be employed by
them, I and other employees participate here in the Apache Kudu project as
individuals and it's important to keep the distinction separate. Kudu is a
product of the ASF non-profit organization, not a product of any commercial
vendor.

-Todd

On Wed, Feb 28, 2018 at 6:17 AM, Boris Tyukin <bo...@boristyukin.com> wrote:

> first of all, thanks for a very quick response. It means a lot to know you
> guys stand behind Kudu and work with users like myself. In fact, we get a
> better support here than through the official channels :)
>
> That said, I needed to vent a bit :) Granted this should be probably
> directed at Impala devs but I think Kudu has been impacted the most.
>
> we ran a cluster health check command and saw some warning in there. But
> we proceeded with running insert statement as Todd suggested with hints and
> now we are back to our time we've used to get with Kudu 1.3 / Impala 2.8
>
> it is a bit frustrating I must say that these changes impacted performance
> so dramatically. In my opinion, Kudu and CDH release notes must have stuff
> like that in BOLD RED colors so it does not catch users by surprise. And
> not everyone so organized like us - we knew exactly how much time it took
> with Kudu 1.3 /Impala 2.8.
>
> I've tracked down all the related JIRAs and I did not see a word about a
> dramatic performance degradation. I did see words like 'optimized' and
> 'improved'.
>
> Since it is part of the official CDH distro, I would expect a little bit
> more proactive warning.
>
> If you want to open a JIRA on this, would be happy to do it...We see this
> degradation all across our tables and as you can see from my example query,
> it is a really straight select from a table - no joins, no predicates and
> no complex calculations.
>
> Thanks again,
> Boris
>
> On Thu, Feb 22, 2018 at 2:44 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> In addition to what Hao suggests, I think it's worth noting that the
>> insert query plan created by Impala changed a bit over time.
>>
>> It sounds like you upgraded from Impala 2.8 (in CDH 5.11), which used a
>> very straightforward insert plan - each node separately inserted rows in
>> whatever order the rows were consumed. This plan worked well for smaller
>> inserts but could cause timeouts with larger workloads.
>>
>> In Impala 2.9, the plan was changed so that Impala performs some
>> shuffling and sorting before inserting into Kudu. This makes the Kudu
>> insert pattern more reliable and efficient, but could cause a degradation
>> for some workloads since Impala's sorts are single-threaded.
>>
>> Impala 2.10 (which I guess you are running) improved a bit over 2.9 in
>> ensuring that the sorts can be "partial" which resolved some of the
>> performance degradation, but it's possible your workload is still affected
>> negatively.
>>
>> To disable the new behavior you can use the insert hints 'noshuffle'
>> and/or 'noclustered', such as:
>>
>> upsert into my_table /* +noclustered,noshuffle */ select * from
>> my_other_table;
>>
>>
>> Hope that helps
>> -Todd
>>
>> On Thu, Feb 22, 2018 at 11:02 AM, Hao Hao <ha...@cloudera.com> wrote:
>>
>>> Did you happen to check the health of the cluster after the upgrade by 'kudu
>>> cluster ksck'?
>>>
>>> Best,
>>> Hao
>>>
>>> On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <bo...@boristyukin.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1
>>>> and noticed quite severe performance degradation. We did CTAS from Impala
>>>> parquet table which has not changed a bit since the upgrade (even the same
>>>> # of rows) to Kudu using the follow query below.
>>>>
>>>> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.
>>>>
>>>> Of course Impala version was also bumped with CDH 5.13.
>>>>
>>>> Any clue why it takes so much time now?
>>>>
>>>> Table has 5.5B rows..
>>>>
>>>> create TABLE kudutest_ts.clinical_event_nots
>>>>
>>>> PRIMARY KEY (clinical_event_id)
>>>>
>>>> PARTITION BY HASH(clinical_event_id) PARTITIONS 120
>>>>
>>>> STORED AS KUDU
>>>>
>>>> AS
>>>>
>>>> SELECT
>>>>
>>>>   clinical_event_id,
>>>>
>>>>   encntr_id,
>>>>
>>>>   person_id,
>>>>
>>>>   encntr_financial_id,
>>>>
>>>>   event_id,
>>>>
>>>>   event_title_text,
>>>>
>>>>   CAST(view_level as string) as view_level,
>>>>
>>>>   order_id,
>>>>
>>>>   catalog_cd,
>>>>
>>>>   series_ref_nbr,
>>>>
>>>>   accession_nbr,
>>>>
>>>>   contributor_system_cd,
>>>>
>>>>   reference_nbr,
>>>>
>>>>   parent_event_id,
>>>>
>>>>   event_reltn_cd,
>>>>
>>>>   event_class_cd,
>>>>
>>>>   event_cd,
>>>>
>>>>   event_tag,
>>>>
>>>>   CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,
>>>>
>>>>   result_val,
>>>>
>>>>   result_units_cd,
>>>>
>>>>   result_time_units_cd,
>>>>
>>>>   task_assay_cd,
>>>>
>>>>   record_status_cd,
>>>>
>>>>   result_status_cd,
>>>>
>>>>   CAST(authentic_flag as STRING) authentic_flag,
>>>>
>>>>   CAST(publish_flag as STRING) publish_flag,
>>>>
>>>>   qc_review_cd,
>>>>
>>>>   normalcy_cd,
>>>>
>>>>   normalcy_method_cd,
>>>>
>>>>   inquire_security_cd,
>>>>
>>>>   resource_group_cd,
>>>>
>>>>   resource_cd,
>>>>
>>>>   CAST(subtable_bit_map as STRING) subtable_bit_map,
>>>>
>>>>   collating_seq,
>>>>
>>>>   verified_prsnl_id,
>>>>
>>>>   performed_prsnl_id,
>>>>
>>>>   updt_id,
>>>>
>>>>   CAST(updt_task as STRING) updt_task,
>>>>
>>>>   updt_cnt,
>>>>
>>>>   CAST(updt_applctx as STRING) updt_applctx,
>>>>
>>>>   normal_low,
>>>>
>>>>   normal_high,
>>>>
>>>>   critical_low,
>>>>
>>>>   critical_high,
>>>>
>>>>   CAST(event_tag_set_flag as STRING) event_tag_set_flag,
>>>>
>>>>   CAST(note_importance_bit_map as STRING) note_importance_bit_map,
>>>>
>>>>   CAST(order_action_sequence as STRING) order_action_sequence,
>>>>
>>>>   entry_mode_cd,
>>>>
>>>>   source_cd,
>>>>
>>>>   clinical_seq,
>>>>
>>>>   CAST(event_end_tz as STRING) event_end_tz,
>>>>
>>>>   CAST(event_start_tz as STRING) event_start_tz,
>>>>
>>>>   CAST(performed_tz as STRING) performed_tz,
>>>>
>>>>   CAST(verified_tz as STRING) verified_tz,
>>>>
>>>>   task_assay_version_nbr,
>>>>
>>>>   modifier_long_text_id,
>>>>
>>>>   ce_dynamic_label_id,
>>>>
>>>>   CAST(nomen_string_flag as STRING) nomen_string_flag,
>>>>
>>>>   src_event_id,
>>>>
>>>>   CAST(last_utc_ts as BIGINT) last_utc_ts,
>>>>
>>>>   device_free_txt,
>>>>
>>>>   CAST(trait_bit_map as STRING) trait_bit_map,
>>>>
>>>>   CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,
>>>>
>>>>   CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,
>>>>
>>>>   CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,
>>>>
>>>>   CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,
>>>>
>>>>   CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,
>>>>
>>>>   CAST(verified_dt_tm as BIGINT) verified_dt_tm,
>>>>
>>>>   CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,
>>>>
>>>>   CAST(updt_dt_tm as BIGINT) updt_dt_tm,
>>>>
>>>>   CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,
>>>>
>>>>   CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,
>>>>
>>>>   CAST(performed_dt_tm as BIGINT) performed_dt_tm,
>>>>
>>>>   txn_id_text,
>>>>
>>>>   CAST(ingest_dt_tm as BIGINT) ingest_dt_tm
>>>>
>>>> FROM v500.clinical_event
>>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

Posted by Boris Tyukin <bo...@boristyukin.com>.

first of all, thanks for a very quick response. It means a lot to know you
guys stand behind Kudu and work with users like myself. In fact, we get a
better support here than through the official channels :)

That said, I needed to vent a bit :) Granted this should be probably
directed at Impala devs but I think Kudu has been impacted the most.

we ran a cluster health check command and saw some warning in there. But we
proceeded with running insert statement as Todd suggested with hints and
now we are back to our time we've used to get with Kudu 1.3 / Impala 2.8

it is a bit frustrating I must say that these changes impacted performance
so dramatically. In my opinion, Kudu and CDH release notes must have stuff
like that in BOLD RED colors so it does not catch users by surprise. And
not everyone so organized like us - we knew exactly how much time it took
with Kudu 1.3 /Impala 2.8.

I've tracked down all the related JIRAs and I did not see a word about a
dramatic performance degradation. I did see words like 'optimized' and
'improved'.

Since it is part of the official CDH distro, I would expect a little bit
more proactive warning.

If you want to open a JIRA on this, would be happy to do it...We see this
degradation all across our tables and as you can see from my example query,
it is a really straight select from a table - no joins, no predicates and
no complex calculations.

Thanks again,
Boris

On Thu, Feb 22, 2018 at 2:44 PM, Todd Lipcon <to...@cloudera.com> wrote:

> In addition to what Hao suggests, I think it's worth noting that the
> insert query plan created by Impala changed a bit over time.
>
> It sounds like you upgraded from Impala 2.8 (in CDH 5.11), which used a
> very straightforward insert plan - each node separately inserted rows in
> whatever order the rows were consumed. This plan worked well for smaller
> inserts but could cause timeouts with larger workloads.
>
> In Impala 2.9, the plan was changed so that Impala performs some shuffling
> and sorting before inserting into Kudu. This makes the Kudu insert pattern
> more reliable and efficient, but could cause a degradation for some
> workloads since Impala's sorts are single-threaded.
>
> Impala 2.10 (which I guess you are running) improved a bit over 2.9 in
> ensuring that the sorts can be "partial" which resolved some of the
> performance degradation, but it's possible your workload is still affected
> negatively.
>
> To disable the new behavior you can use the insert hints 'noshuffle'
> and/or 'noclustered', such as:
>
> upsert into my_table /* +noclustered,noshuffle */ select * from
> my_other_table;
>
>
> Hope that helps
> -Todd
>
> On Thu, Feb 22, 2018 at 11:02 AM, Hao Hao <ha...@cloudera.com> wrote:
>
>> Did you happen to check the health of the cluster after the upgrade by 'kudu
>> cluster ksck'?
>>
>> Best,
>> Hao
>>
>> On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <bo...@boristyukin.com>
>> wrote:
>>
>>> Hello,
>>>
>>> we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1
>>> and noticed quite severe performance degradation. We did CTAS from Impala
>>> parquet table which has not changed a bit since the upgrade (even the same
>>> # of rows) to Kudu using the follow query below.
>>>
>>> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.
>>>
>>> Of course Impala version was also bumped with CDH 5.13.
>>>
>>> Any clue why it takes so much time now?
>>>
>>> Table has 5.5B rows..
>>>
>>> create TABLE kudutest_ts.clinical_event_nots
>>>
>>> PRIMARY KEY (clinical_event_id)
>>>
>>> PARTITION BY HASH(clinical_event_id) PARTITIONS 120
>>>
>>> STORED AS KUDU
>>>
>>> AS
>>>
>>> SELECT
>>>
>>>   clinical_event_id,
>>>
>>>   encntr_id,
>>>
>>>   person_id,
>>>
>>>   encntr_financial_id,
>>>
>>>   event_id,
>>>
>>>   event_title_text,
>>>
>>>   CAST(view_level as string) as view_level,
>>>
>>>   order_id,
>>>
>>>   catalog_cd,
>>>
>>>   series_ref_nbr,
>>>
>>>   accession_nbr,
>>>
>>>   contributor_system_cd,
>>>
>>>   reference_nbr,
>>>
>>>   parent_event_id,
>>>
>>>   event_reltn_cd,
>>>
>>>   event_class_cd,
>>>
>>>   event_cd,
>>>
>>>   event_tag,
>>>
>>>   CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,
>>>
>>>   result_val,
>>>
>>>   result_units_cd,
>>>
>>>   result_time_units_cd,
>>>
>>>   task_assay_cd,
>>>
>>>   record_status_cd,
>>>
>>>   result_status_cd,
>>>
>>>   CAST(authentic_flag as STRING) authentic_flag,
>>>
>>>   CAST(publish_flag as STRING) publish_flag,
>>>
>>>   qc_review_cd,
>>>
>>>   normalcy_cd,
>>>
>>>   normalcy_method_cd,
>>>
>>>   inquire_security_cd,
>>>
>>>   resource_group_cd,
>>>
>>>   resource_cd,
>>>
>>>   CAST(subtable_bit_map as STRING) subtable_bit_map,
>>>
>>>   collating_seq,
>>>
>>>   verified_prsnl_id,
>>>
>>>   performed_prsnl_id,
>>>
>>>   updt_id,
>>>
>>>   CAST(updt_task as STRING) updt_task,
>>>
>>>   updt_cnt,
>>>
>>>   CAST(updt_applctx as STRING) updt_applctx,
>>>
>>>   normal_low,
>>>
>>>   normal_high,
>>>
>>>   critical_low,
>>>
>>>   critical_high,
>>>
>>>   CAST(event_tag_set_flag as STRING) event_tag_set_flag,
>>>
>>>   CAST(note_importance_bit_map as STRING) note_importance_bit_map,
>>>
>>>   CAST(order_action_sequence as STRING) order_action_sequence,
>>>
>>>   entry_mode_cd,
>>>
>>>   source_cd,
>>>
>>>   clinical_seq,
>>>
>>>   CAST(event_end_tz as STRING) event_end_tz,
>>>
>>>   CAST(event_start_tz as STRING) event_start_tz,
>>>
>>>   CAST(performed_tz as STRING) performed_tz,
>>>
>>>   CAST(verified_tz as STRING) verified_tz,
>>>
>>>   task_assay_version_nbr,
>>>
>>>   modifier_long_text_id,
>>>
>>>   ce_dynamic_label_id,
>>>
>>>   CAST(nomen_string_flag as STRING) nomen_string_flag,
>>>
>>>   src_event_id,
>>>
>>>   CAST(last_utc_ts as BIGINT) last_utc_ts,
>>>
>>>   device_free_txt,
>>>
>>>   CAST(trait_bit_map as STRING) trait_bit_map,
>>>
>>>   CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,
>>>
>>>   CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,
>>>
>>>   CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,
>>>
>>>   CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,
>>>
>>>   CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,
>>>
>>>   CAST(verified_dt_tm as BIGINT) verified_dt_tm,
>>>
>>>   CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,
>>>
>>>   CAST(updt_dt_tm as BIGINT) updt_dt_tm,
>>>
>>>   CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,
>>>
>>>   CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,
>>>
>>>   CAST(performed_dt_tm as BIGINT) performed_dt_tm,
>>>
>>>   txn_id_text,
>>>
>>>   CAST(ingest_dt_tm as BIGINT) ingest_dt_tm
>>>
>>> FROM v500.clinical_event
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

Posted by Todd Lipcon <to...@cloudera.com>.

In addition to what Hao suggests, I think it's worth noting that the insert
query plan created by Impala changed a bit over time.

It sounds like you upgraded from Impala 2.8 (in CDH 5.11), which used a
very straightforward insert plan - each node separately inserted rows in
whatever order the rows were consumed. This plan worked well for smaller
inserts but could cause timeouts with larger workloads.

In Impala 2.9, the plan was changed so that Impala performs some shuffling
and sorting before inserting into Kudu. This makes the Kudu insert pattern
more reliable and efficient, but could cause a degradation for some
workloads since Impala's sorts are single-threaded.

Impala 2.10 (which I guess you are running) improved a bit over 2.9 in
ensuring that the sorts can be "partial" which resolved some of the
performance degradation, but it's possible your workload is still affected
negatively.

To disable the new behavior you can use the insert hints 'noshuffle' and/or
'noclustered', such as:

upsert into my_table /* +noclustered,noshuffle */ select * from
my_other_table;


Hope that helps
-Todd

On Thu, Feb 22, 2018 at 11:02 AM, Hao Hao <ha...@cloudera.com> wrote:

> Did you happen to check the health of the cluster after the upgrade by 'kudu
> cluster ksck'?
>
> Best,
> Hao
>
> On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> Hello,
>>
>> we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1
>> and noticed quite severe performance degradation. We did CTAS from Impala
>> parquet table which has not changed a bit since the upgrade (even the same
>> # of rows) to Kudu using the follow query below.
>>
>> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.
>>
>> Of course Impala version was also bumped with CDH 5.13.
>>
>> Any clue why it takes so much time now?
>>
>> Table has 5.5B rows..
>>
>> create TABLE kudutest_ts.clinical_event_nots
>>
>> PRIMARY KEY (clinical_event_id)
>>
>> PARTITION BY HASH(clinical_event_id) PARTITIONS 120
>>
>> STORED AS KUDU
>>
>> AS
>>
>> SELECT
>>
>>   clinical_event_id,
>>
>>   encntr_id,
>>
>>   person_id,
>>
>>   encntr_financial_id,
>>
>>   event_id,
>>
>>   event_title_text,
>>
>>   CAST(view_level as string) as view_level,
>>
>>   order_id,
>>
>>   catalog_cd,
>>
>>   series_ref_nbr,
>>
>>   accession_nbr,
>>
>>   contributor_system_cd,
>>
>>   reference_nbr,
>>
>>   parent_event_id,
>>
>>   event_reltn_cd,
>>
>>   event_class_cd,
>>
>>   event_cd,
>>
>>   event_tag,
>>
>>   CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,
>>
>>   result_val,
>>
>>   result_units_cd,
>>
>>   result_time_units_cd,
>>
>>   task_assay_cd,
>>
>>   record_status_cd,
>>
>>   result_status_cd,
>>
>>   CAST(authentic_flag as STRING) authentic_flag,
>>
>>   CAST(publish_flag as STRING) publish_flag,
>>
>>   qc_review_cd,
>>
>>   normalcy_cd,
>>
>>   normalcy_method_cd,
>>
>>   inquire_security_cd,
>>
>>   resource_group_cd,
>>
>>   resource_cd,
>>
>>   CAST(subtable_bit_map as STRING) subtable_bit_map,
>>
>>   collating_seq,
>>
>>   verified_prsnl_id,
>>
>>   performed_prsnl_id,
>>
>>   updt_id,
>>
>>   CAST(updt_task as STRING) updt_task,
>>
>>   updt_cnt,
>>
>>   CAST(updt_applctx as STRING) updt_applctx,
>>
>>   normal_low,
>>
>>   normal_high,
>>
>>   critical_low,
>>
>>   critical_high,
>>
>>   CAST(event_tag_set_flag as STRING) event_tag_set_flag,
>>
>>   CAST(note_importance_bit_map as STRING) note_importance_bit_map,
>>
>>   CAST(order_action_sequence as STRING) order_action_sequence,
>>
>>   entry_mode_cd,
>>
>>   source_cd,
>>
>>   clinical_seq,
>>
>>   CAST(event_end_tz as STRING) event_end_tz,
>>
>>   CAST(event_start_tz as STRING) event_start_tz,
>>
>>   CAST(performed_tz as STRING) performed_tz,
>>
>>   CAST(verified_tz as STRING) verified_tz,
>>
>>   task_assay_version_nbr,
>>
>>   modifier_long_text_id,
>>
>>   ce_dynamic_label_id,
>>
>>   CAST(nomen_string_flag as STRING) nomen_string_flag,
>>
>>   src_event_id,
>>
>>   CAST(last_utc_ts as BIGINT) last_utc_ts,
>>
>>   device_free_txt,
>>
>>   CAST(trait_bit_map as STRING) trait_bit_map,
>>
>>   CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,
>>
>>   CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,
>>
>>   CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,
>>
>>   CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,
>>
>>   CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,
>>
>>   CAST(verified_dt_tm as BIGINT) verified_dt_tm,
>>
>>   CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,
>>
>>   CAST(updt_dt_tm as BIGINT) updt_dt_tm,
>>
>>   CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,
>>
>>   CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,
>>
>>   CAST(performed_dt_tm as BIGINT) performed_dt_tm,
>>
>>   txn_id_text,
>>
>>   CAST(ingest_dt_tm as BIGINT) ingest_dt_tm
>>
>> FROM v500.clinical_event
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

Posted by Hao Hao <ha...@cloudera.com>.

Did you happen to check the health of the cluster after the upgrade by 'kudu
cluster ksck'?

Best,
Hao

On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin <bo...@boristyukin.com> wrote:

> Hello,
>
> we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1
> and noticed quite severe performance degradation. We did CTAS from Impala
> parquet table which has not changed a bit since the upgrade (even the same
> # of rows) to Kudu using the follow query below.
>
> It used to take 11-11.5 hours on Kudu 1.3 and now taking 50-55 hours.
>
> Of course Impala version was also bumped with CDH 5.13.
>
> Any clue why it takes so much time now?
>
> Table has 5.5B rows..
>
> create TABLE kudutest_ts.clinical_event_nots
>
> PRIMARY KEY (clinical_event_id)
>
> PARTITION BY HASH(clinical_event_id) PARTITIONS 120
>
> STORED AS KUDU
>
> AS
>
> SELECT
>
>   clinical_event_id,
>
>   encntr_id,
>
>   person_id,
>
>   encntr_financial_id,
>
>   event_id,
>
>   event_title_text,
>
>   CAST(view_level as string) as view_level,
>
>   order_id,
>
>   catalog_cd,
>
>   series_ref_nbr,
>
>   accession_nbr,
>
>   contributor_system_cd,
>
>   reference_nbr,
>
>   parent_event_id,
>
>   event_reltn_cd,
>
>   event_class_cd,
>
>   event_cd,
>
>   event_tag,
>
>   CAST(event_end_dt_tm_os as BIGINT) as event_end_dt_tm_os,
>
>   result_val,
>
>   result_units_cd,
>
>   result_time_units_cd,
>
>   task_assay_cd,
>
>   record_status_cd,
>
>   result_status_cd,
>
>   CAST(authentic_flag as STRING) authentic_flag,
>
>   CAST(publish_flag as STRING) publish_flag,
>
>   qc_review_cd,
>
>   normalcy_cd,
>
>   normalcy_method_cd,
>
>   inquire_security_cd,
>
>   resource_group_cd,
>
>   resource_cd,
>
>   CAST(subtable_bit_map as STRING) subtable_bit_map,
>
>   collating_seq,
>
>   verified_prsnl_id,
>
>   performed_prsnl_id,
>
>   updt_id,
>
>   CAST(updt_task as STRING) updt_task,
>
>   updt_cnt,
>
>   CAST(updt_applctx as STRING) updt_applctx,
>
>   normal_low,
>
>   normal_high,
>
>   critical_low,
>
>   critical_high,
>
>   CAST(event_tag_set_flag as STRING) event_tag_set_flag,
>
>   CAST(note_importance_bit_map as STRING) note_importance_bit_map,
>
>   CAST(order_action_sequence as STRING) order_action_sequence,
>
>   entry_mode_cd,
>
>   source_cd,
>
>   clinical_seq,
>
>   CAST(event_end_tz as STRING) event_end_tz,
>
>   CAST(event_start_tz as STRING) event_start_tz,
>
>   CAST(performed_tz as STRING) performed_tz,
>
>   CAST(verified_tz as STRING) verified_tz,
>
>   task_assay_version_nbr,
>
>   modifier_long_text_id,
>
>   ce_dynamic_label_id,
>
>   CAST(nomen_string_flag as STRING) nomen_string_flag,
>
>   src_event_id,
>
>   CAST(last_utc_ts as BIGINT) last_utc_ts,
>
>   device_free_txt,
>
>   CAST(trait_bit_map as STRING) trait_bit_map,
>
>   CAST(clu_subkey1_flag as STRING) clu_subkey1_flag,
>
>   CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm,
>
>   CAST(event_end_dt_tm as BIGINT) event_end_dt_tm,
>
>   CAST(event_start_dt_tm as BIGINT) event_start_dt_tm,
>
>   CAST(expiration_dt_tm as BIGINT) expiration_dt_tm,
>
>   CAST(verified_dt_tm as BIGINT) verified_dt_tm,
>
>   CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm,
>
>   CAST(updt_dt_tm as BIGINT) updt_dt_tm,
>
>   CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm,
>
>   CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm,
>
>   CAST(performed_dt_tm as BIGINT) performed_dt_tm,
>
>   txn_id_text,
>
>   CAST(ingest_dt_tm as BIGINT) ingest_dt_tm
>
> FROM v500.clinical_event
>