You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/03/11 01:31:04 UTC
[jira] [Commented] (KUDU-1930) Improve performance of dictionary builder

    [ https://issues.apache.org/jira/browse/KUDU-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905984#comment-15905984 ] 

Todd Lipcon commented on KUDU-1930:
-----------------------------------

Tested with commands like:
{code}
./build/latest/bin/tpch_real_world -tpch_path_to_ts_flags_file ./tsflags  -tpch_scaling_factor 100 -tpch_num_inserters 8 -notpch_run_queries -tpch_path_to_dbgen_dir /data/2/mpercy/tpch_2_17_0/dbgen -tpch_partition_strategy hash
{code}
(the new 'hash' partition strategy is from a simple local patch)

Results:
{code}
with 1 MM thread:
I0310 13:53:38.925390  1568 tpch_real_world.cc:278] Time spent by thread 2 to load generated data into the database: real 1140.187s	user 398.411s	sys 5.315s

with 4 MM threads:
I0310 14:06:35.509299  8524 tpch_real_world.cc:278] Time spent by thread 4 to load generated data into the database: real 618.348s	user 413.431s	sys 5.437s

with 4 MM threads, hash partition:
I0310 15:32:52.386118 27623 tpch_real_world.cc:289] Time spent by thread 2 to load generated data into the database: real 1233.386s	user 462.671s	sys 6.084s

with 4 MM threads, using dense_hash_map instead of std::unordered_map for dictionary builder:
I0310 17:26:00.682138 32076 tpch_real_world.cc:289] Time spent by thread 0 to load generated data into the database: real 1147.478s	user 464.147s	sys 6.200s
{code}

The "user" times here are from the client side, so not that relevant, whereas "real" is the total wall time taken. It seems like dense_hash_map is an easy 7% speedup relative to the STL map. As we've long known, inserting in sorted order (range partitioned) is 2x faster than non-sorted order (and the longer the benchmark runs, the more the difference magnifies)

> Improve performance of dictionary builder
> -----------------------------------------
>
>                 Key: KUDU-1930
>                 URL: https://issues.apache.org/jira/browse/KUDU-1930
>             Project: Kudu
>          Issue Type: Bug
>          Components: cfile, perf
>    Affects Versions: 1.3.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> I locally tweaked tpch_real_world to use hash partitioning instead of range partitioning, so that the different threads overlapped on the same tablets, simulating a more realistic parallel load scenario. I noticed that the MM threads were CPU bound, with a high percentage of CPU in AddCodeWords(). Initial prototypes indicate that optimizing the hashmap used here would be an easy win.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)