You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/03/11 01:31:04 UTC
[jira] [Commented] (KUDU-1930) Improve performance of dictionary
builder
[ https://issues.apache.org/jira/browse/KUDU-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905984#comment-15905984 ]
Todd Lipcon commented on KUDU-1930:
-----------------------------------
Tested with commands like:
{code}
./build/latest/bin/tpch_real_world -tpch_path_to_ts_flags_file ./tsflags -tpch_scaling_factor 100 -tpch_num_inserters 8 -notpch_run_queries -tpch_path_to_dbgen_dir /data/2/mpercy/tpch_2_17_0/dbgen -tpch_partition_strategy hash
{code}
(the new 'hash' partition strategy is from a simple local patch)
Results:
{code}
with 1 MM thread:
I0310 13:53:38.925390 1568 tpch_real_world.cc:278] Time spent by thread 2 to load generated data into the database: real 1140.187s user 398.411s sys 5.315s
with 4 MM threads:
I0310 14:06:35.509299 8524 tpch_real_world.cc:278] Time spent by thread 4 to load generated data into the database: real 618.348s user 413.431s sys 5.437s
with 4 MM threads, hash partition:
I0310 15:32:52.386118 27623 tpch_real_world.cc:289] Time spent by thread 2 to load generated data into the database: real 1233.386s user 462.671s sys 6.084s
with 4 MM threads, using dense_hash_map instead of std::unordered_map for dictionary builder:
I0310 17:26:00.682138 32076 tpch_real_world.cc:289] Time spent by thread 0 to load generated data into the database: real 1147.478s user 464.147s sys 6.200s
{code}
The "user" times here are from the client side, so not that relevant, whereas "real" is the total wall time taken. It seems like dense_hash_map is an easy 7% speedup relative to the STL map. As we've long known, inserting in sorted order (range partitioned) is 2x faster than non-sorted order (and the longer the benchmark runs, the more the difference magnifies)
> Improve performance of dictionary builder
> -----------------------------------------
>
> Key: KUDU-1930
> URL: https://issues.apache.org/jira/browse/KUDU-1930
> Project: Kudu
> Issue Type: Bug
> Components: cfile, perf
> Affects Versions: 1.3.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> I locally tweaked tpch_real_world to use hash partitioning instead of range partitioning, so that the different threads overlapped on the same tablets, simulating a more realistic parallel load scenario. I noticed that the MM threads were CPU bound, with a high percentage of CPU in AddCodeWords(). Initial prototypes indicate that optimizing the hashmap used here would be an easy win.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)