You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/12/24 19:52:43 UTC

[jira] Created: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
--------------------------------------------------------------------------

                 Key: HADOOP-2488
                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
             Project: Hadoop
          Issue Type: Bug
            Reporter: stack


Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

Random read performance is important to hbase.

Serial reads and writes perform about the same in TRUNK and 0.15.x.

Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:

0.15 branch
{code}
[stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
..
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
{code}

TRUNK
{code]
[stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms

[stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
{code}

Above was done on a small hdfs cluster of 4 machines with each a dfs node.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2488:
--------------------------

    Summary: Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x  (was: Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x)

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi reassigned HADOOP-2488:
------------------------------------

    Assignee: stack

I will commit this today. Moving to true random read basically avoids the extra read of 'BUFFER_SIZE' at the beginning. We can fix that later.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554506 ] 

rangadi edited comment on HADOOP-2488 at 12/26/07 6:11 PM:
----------------------------------------------------------------

This is interesting. I haven't tried more to run the hbase command yet. 
This implies two things: 
# FileInpuStream.skip() probably does seek() internally. good.
# seek() instead of skip is important even for crc files.

Could you make MetadataInputStream extend FilterInputStream so that it is less prone to such errors?
we can remove all the implementations of read(), skip() etc from it.

Could you give approximate size of each random read and what the block size is?



      was (Author: rangadi):
    This is interesting. I haven't tried more to run the hbase command yet. 
This implies two things: 
#This implies that FileInpuStream.skip() probably does seek() internally, good.
#seek() instead of skip is important even for crc files.

Could you make MetadataInputStream extend FilterInputStream so that it is less prone to such errors?
we can remove all the implementations of read(), skip() etc from it.

Could you give approximate size of each random read?


  
> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2488:
--------------------------

    Description: 
Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

Random read performance is important to hbase.

Serial reads and writes perform about the same in TRUNK and 0.15.x.

Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:

0.15 branch
{code}
[stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
..
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
{code}

TRUNK
{code}
[stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms

[stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
{code}

Above was done on a small hdfs cluster of 4 machines with each a dfs node.


  was:
Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

Random read performance is important to hbase.

Serial reads and writes perform about the same in TRUNK and 0.15.x.

Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:

0.15 branch
{code}
[stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
..
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
{code}

TRUNK
{code]
[stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms

[stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
{code}

Above was done on a small hdfs cluster of 4 machines with each a dfs node.



> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-2488:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Stack!

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554490 ] 

stack commented on HADOOP-2488:
-------------------------------

The below patch -- using the BufferedInputStream implementation of skip rather than default InputStream version -- brings it close to 0.15 speeds.  With below applied, TRUNK is just 5% slower.

{code}
Index: src/java/org/apache/hadoop/dfs/FSDatasetInterface.java
===================================================================
--- src/java/org/apache/hadoop/dfs/FSDatasetInterface.java      (revision 606938)
+++ src/java/org/apache/hadoop/dfs/FSDatasetInterface.java      (working copy)
@@ -68,6 +68,16 @@
       return inStream.read(b, off, len);
     } 

+    @Override
+    public int available() throws IOException {
+      return this.inStream.available();
+    }
+
+    @Override
+    public long skip(long n) throws IOException {
+      return this.inStream.skip(n);
+    }
+
     public long getLength() {
       return length;
     }
{code}

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2488:
--------------------------

    Fix Version/s: 0.16.0
           Status: Patch Available  (was: Open)

Get a +1 from hudson so can get this committed.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555047 ] 

stack commented on HADOOP-2488:
-------------------------------

The failures were unrelated, in hbase.  These should not get in the way of this patch getting committed.  Thanks.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554526 ] 

stack commented on HADOOP-2488:
-------------------------------

I tried the patch.  Has same general effect moving us to within ~5-7% of the 0.15.x timings:

{code}
[stack@aa0-000-12 hbase]$ for i in 1 2 3 ; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/27 05:29:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/27 05:29:17 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/27 05:29:42 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24882ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/27 05:29:42 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/27 05:32:01 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 138881ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/27 05:32:01 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/27 05:32:03 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1831ms
07/12/27 05:32:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/27 05:32:04 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/27 05:32:28 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24043ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/27 05:32:28 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/27 05:34:45 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 136419ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/27 05:34:45 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/27 05:34:47 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1810ms
07/12/27 05:34:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/27 05:34:48 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X,X,X:9123/user/?/performanceevaluation.mapfile
07/12/27 05:35:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/27 05:35:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/27 05:37:33 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 140445ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/27 05:37:33 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/27 05:37:35 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1852ms
{code}

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555148 ] 

Raghu Angadi commented on HADOOP-2488:
--------------------------------------

Thanks for running the patch.

> E.g. in BlockSender, it calls IOUtils.skipFully to position in the CRC file. In the profiler, it looks like IOUtils.skipFully is
> dumbing down the DIS into an IS calling default InputStream.skip instead of the BufferedInputStream
> version which runs a bit faster - though in the overall scheme of things, we're talking a pittance... probably < 1%.

I am not sure what the difference is. IOUtils.skipFully() just invokes in.skip().  
It does not imply it invokes InputStream.skip(), it depends on actual runtime class of 'in'. I am not sure if I 
understood the actual concern.

> I took a look at adding seek to the reading of CRC files. 
FileInputStream is probably results in seek internally. Otherwise I am not sure what accounts for 
this bug and the fix. We could make it a buffered input stream after reading the header, it might make it a bit faster than 15. 



> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-2488:
---------------------------------

    Attachment: HADOOP-2488.patch

> make MetadataInputStream extend FilterInputStream so that it is less prone to such errors?
> we can remove all the implementations of read(), skip() etc from it.
patch attached.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555152 ] 

stack commented on HADOOP-2488:
-------------------------------

>> I took a look at adding seek to the reading of CRC files.
> FileInputStream is probably results in seek internally...We could make it a buffered input stream after reading the header, it might make it a bit faster than 15.

It is already? (See line 1287 of DataNode.java).  Maybe you mean something else?

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554317 ] 

Raghu Angadi commented on HADOOP-2488:
--------------------------------------

1989 was my first suspect as well. I am going to look at the patch and the code. 

> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555044 ] 

Hadoop QA commented on HADOOP-2488:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12372241/HADOOP-2488.patch
against trunk revision r607330.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1438/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1438/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1438/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1438/console

This message is automatically generated.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554314 ] 

Raghu Angadi commented on HADOOP-2488:
--------------------------------------

How do I set up the environment? I didn't have bin/hbase, so I tried
'build/contrib/hbase/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1' from top level dir and got :
{noformat}Unrecognized VM option '+HeapDumpOnOutOfMemoryError'
Could not create the Java virtual machine.{noformat}


> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554450 ] 

stack commented on HADOOP-2488:
-------------------------------

I ran tests with the revision that added 1989.  It slows things down by about 50% so its not the only culprit it seems.

Looking in the profiler, I see that ~10k invocations of readBlock turn into ~2M invocations of sendChunk.  Chunks seem to be 512 bytes.  Is that right?  Seems small.

> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554535 ] 

stack commented on HADOOP-2488:
-------------------------------

On the character of the data this little mapfile test uses, its a key that is a a null-padded integer and a random string of 1000 bytes.  Its the data format described in the Performance Evaluation section of the bigtable paper.  See code if you want to know more.

On seek rather than skip even in CRC files, yes, that would help I think.  E.g. in BlockSender, it calls IOUtils.skipFully to position in the CRC file.  In the profiler, it looks like IOUtils.skipFully is dumbing down the DIS into an IS calling default InputStream.skip instead of the BufferedInputStream version which runs a bit faster -- though in the overall scheme of things, we're talking a pittance... probably < 1%.

Of interest though is that the likes of the sendChunk method can be called millions of times randomly accessing a file of any decent size.  I tried assigning instance variables to locals to see if it would make a difference but detected nothing noticeable.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554455 ] 

stack commented on HADOOP-2488:
-------------------------------

Hmm... seems like checksum sizes are default of 512 in 0.15 branch so that ain't it.

I did a clean checkout of trunk and spent more time on timings and my original claim that TRUNK is 3times slower is incorrect: its *just* ~40% slower.

0.15 branch
{code}
[stack@aa0-000-12 hbase]$ for i in 1 2 3 ; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/26 19:08:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/26 19:08:11 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/26 19:08:36 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24159ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/26 19:08:36 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
07/12/26 19:08:52 INFO hbase.PerformanceEvaluation: Read 10000
07/12/26 19:09:05 INFO hbase.PerformanceEvaluation: Read 20000
07/12/26 19:09:17 INFO hbase.PerformanceEvaluation: Read 30000
07/12/26 19:09:30 INFO hbase.PerformanceEvaluation: Read 40000
07/12/26 19:09:42 INFO hbase.PerformanceEvaluation: Read 50000
07/12/26 19:09:55 INFO hbase.PerformanceEvaluation: Read 60000
07/12/26 19:10:08 INFO hbase.PerformanceEvaluation: Read 70000
07/12/26 19:10:20 INFO hbase.PerformanceEvaluation: Read 80000
07/12/26 19:10:33 INFO hbase.PerformanceEvaluation: Read 90000
07/12/26 19:10:45 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129836ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/26 19:10:45 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/26 19:10:47 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1717ms
07/12/26 19:10:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/26 19:10:48 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/26 19:11:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23859ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/26 19:11:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
07/12/26 19:11:25 INFO hbase.PerformanceEvaluation: Read 10000
07/12/26 19:11:39 INFO hbase.PerformanceEvaluation: Read 20000
07/12/26 19:11:51 INFO hbase.PerformanceEvaluation: Read 30000
07/12/26 19:12:03 INFO hbase.PerformanceEvaluation: Read 40000
07/12/26 19:12:16 INFO hbase.PerformanceEvaluation: Read 50000
07/12/26 19:12:29 INFO hbase.PerformanceEvaluation: Read 60000
07/12/26 19:12:41 INFO hbase.PerformanceEvaluation: Read 70000
07/12/26 19:12:54 INFO hbase.PerformanceEvaluation: Read 80000
07/12/26 19:13:06 INFO hbase.PerformanceEvaluation: Read 90000
07/12/26 19:13:19 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126779ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/26 19:13:19 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/26 19:13:20 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1769ms
07/12/26 19:13:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/26 19:13:21 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/26 19:13:45 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23679ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/26 19:13:45 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
07/12/26 19:13:58 INFO hbase.PerformanceEvaluation: Read 10000
07/12/26 19:14:11 INFO hbase.PerformanceEvaluation: Read 20000
07/12/26 19:14:23 INFO hbase.PerformanceEvaluation: Read 30000
07/12/26 19:14:36 INFO hbase.PerformanceEvaluation: Read 40000
07/12/26 19:14:48 INFO hbase.PerformanceEvaluation: Read 50000
07/12/26 19:15:01 INFO hbase.PerformanceEvaluation: Read 60000
07/12/26 19:15:13 INFO hbase.PerformanceEvaluation: Read 70000
07/12/26 19:15:26 INFO hbase.PerformanceEvaluation: Read 80000
07/12/26 19:15:38 INFO hbase.PerformanceEvaluation: Read 90000
07/12/26 19:15:51 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 125860ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/26 19:15:51 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/26 19:15:52 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1836ms
{code}

TRUNK
{code}
[stack@aa0-000-12 hbase]$ for i in 1 2 3 ; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/26 18:50:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/26 18:50:17 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/26 18:50:41 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24191ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/26 18:50:41 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
07/12/26 18:51:02 INFO hbase.PerformanceEvaluation: Read 10000
07/12/26 18:51:20 INFO hbase.PerformanceEvaluation: Read 20000
07/12/26 18:51:38 INFO hbase.PerformanceEvaluation: Read 30000
07/12/26 18:51:55 INFO hbase.PerformanceEvaluation: Read 40000
07/12/26 18:52:13 INFO hbase.PerformanceEvaluation: Read 50000
07/12/26 18:52:29 INFO hbase.PerformanceEvaluation: Read 60000
07/12/26 18:52:47 INFO hbase.PerformanceEvaluation: Read 70000
07/12/26 18:53:04 INFO hbase.PerformanceEvaluation: Read 80000
07/12/26 18:53:21 INFO hbase.PerformanceEvaluation: Read 90000
07/12/26 18:53:39 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 177692ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/26 18:53:39 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/26 18:53:41 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1749ms
07/12/26 18:53:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/26 18:53:42 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/26 18:54:06 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23873ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/26 18:54:06 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
07/12/26 18:54:25 INFO hbase.PerformanceEvaluation: Read 10000
07/12/26 18:54:44 INFO hbase.PerformanceEvaluation: Read 20000
07/12/26 18:55:01 INFO hbase.PerformanceEvaluation: Read 30000
07/12/26 18:55:17 INFO hbase.PerformanceEvaluation: Read 40000
07/12/26 18:55:33 INFO hbase.PerformanceEvaluation: Read 50000
07/12/26 18:55:51 INFO hbase.PerformanceEvaluation: Read 60000
07/12/26 18:56:08 INFO hbase.PerformanceEvaluation: Read 70000
07/12/26 18:56:25 INFO hbase.PerformanceEvaluation: Read 80000
07/12/26 18:56:41 INFO hbase.PerformanceEvaluation: Read 90000
07/12/26 18:56:57 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 171167ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/26 18:56:57 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/26 18:56:59 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1789ms
07/12/26 18:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
07/12/26 18:57:00 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://X.X.X:9123/user/?/performanceevaluation.mapfile
07/12/26 18:57:25 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24076ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
07/12/26 18:57:25 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
07/12/26 18:57:43 INFO hbase.PerformanceEvaluation: Read 10000
07/12/26 18:58:01 INFO hbase.PerformanceEvaluation: Read 20000
07/12/26 18:58:18 INFO hbase.PerformanceEvaluation: Read 30000
07/12/26 18:58:35 INFO hbase.PerformanceEvaluation: Read 40000
07/12/26 18:58:52 INFO hbase.PerformanceEvaluation: Read 50000
07/12/26 18:59:09 INFO hbase.PerformanceEvaluation: Read 60000
07/12/26 18:59:26 INFO hbase.PerformanceEvaluation: Read 70000
07/12/26 18:59:42 INFO hbase.PerformanceEvaluation: Read 80000
07/12/26 19:00:00 INFO hbase.PerformanceEvaluation: Read 90000
07/12/26 19:00:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 171260ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
07/12/26 19:00:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
07/12/26 19:00:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1764ms
{code}

> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554506 ] 

Raghu Angadi commented on HADOOP-2488:
--------------------------------------

This is interesting. I haven't tried more to run the hbase command yet. 
This implies two things: 
#This implies that FileInpuStream.skip() probably does seek() internally, good.
#seek() instead of skip is important even for crc files.

Could you make MetadataInputStream extend FilterInputStream so that it is less prone to such errors?
we can remove all the implementations of read(), skip() etc from it.

Could you give approximate size of each random read?



> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554315 ] 

stack commented on HADOOP-2488:
-------------------------------

Thanks for trying to take a look at this Raghu.

Problem seems to be the JVM you are using.  It doesn't seem to like the 'HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError"' we add in hbase.  Try editing it out of your bin/hbase.

More info on this bug is that it seems to be another symptom of HADOOP-1989.  I just backed it out and the random reads come in close to 0.15.x times again.

Only think I could see profiling is that we seem to be making too many invocations of DataXceiver.readBlock (thats where we are spending bulk of CPU time); ten or a hundred more times than I'd expect.



> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2488) Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555048 ] 

stack commented on HADOOP-2488:
-------------------------------

Oh, I took a look at adding seek to the reading of CRC files.  Would take a bit of work given that the files have a header of info that that has to be read first before can start seeking around.  Should I make an issue to add seeking?  Wouldn't save us much but does look like seek to the correct offset in the CRC file is suboptimal when random reading.

> Random reads in mapfile are ~40% slower in TRUNK than they are in 0.15.x
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2488.patch
>
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [stack@aa0-000-12 hbase]$ for i in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23644ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24188ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 127879ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 23832ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 126954ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 24181ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 129564ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1793ms 
> {code}
> TRUNK
> {code}
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25500ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 500306ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1940ms
> [stack@aa0-000-12 hbase]$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 25593ms (Note: generation of keys and values is done inline and has been seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random records took 543992ms (Note: generation of random key is done in line and takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.