You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/02 18:16:00 UTC

[jira] [Resolved] (KUDU-2025) Upsert throughput is 10~20% slower than insert

     [ https://issues.apache.org/jira/browse/KUDU-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke resolved KUDU-2025.
-------------------------------
    Fix Version/s: NA
       Resolution: Not A Problem

> Upsert throughput is 10~20% slower than insert
> ----------------------------------------------
>
>                 Key: KUDU-2025
>                 URL: https://issues.apache.org/jira/browse/KUDU-2025
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Juan Yu
>            Priority: Major
>             Fix For: NA
>
>
> According to Kudu design, upsert should be faster than insert.
> I ran some tests to compare upsert and insert performance
> picked a few tables (those larger one like store_sales, catalog_sales) from tpcds, each table is hash partitioned by first 3 columns. data are generated (shouldn't have duplicate key), 100G ~ 1TB range. each time data are ingested to newly created table.
> In general, the upsert throughput is 10~20% slower than insert according to CM metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)