You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/12/22 05:57:44 UTC

[jira] Created: (HADOOP-2485) [hbase] Make mapfile index interval configurable

[hbase] Make mapfile index interval configurable
------------------------------------------------

                 Key: HADOOP-2485
                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
             Project: Hadoop
          Issue Type: Improvement
          Components: contrib/hbase
            Reporter: stack
            Priority: Minor


Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 

Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: Patch Available  (was: In Progress)

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: In Progress  (was: Patch Available)

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: Patch Available  (was: In Progress)

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: In Progress  (was: Patch Available)

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554689 ] 

Hudson commented on HADOOP-2485:
--------------------------------

Integrated in Hadoop-Nightly #348 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/348/])

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Attachment: 2485.patch

Hudson won't pick up my patch.  Trying again with a differently named patch to see if that makes a difference.

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: In Progress  (was: Patch Available)

Retrying... after hudson restart

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: Patch Available  (was: In Progress)

Hudson is idle. Requeue.

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Attachment: index.patch

Add being able to configure mapfile index interval.  Leave it at default for now.  hdfs in TRUNK is 50% slower doing PerformanceEvaluation.  Let me figure why before changing default from 128 to something like 16 or 32.

M src/contrib/hbase/conf/hbase-default.xml
    Add hbase.io.index.interval
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
    Add an HbaseMapFile.  Move the hbase'isms into it.  Have bloom filter
    etc. subclass it.  Read hbase.io.index.interval writing index file.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
    Count hstorefile entries.  Emit count when logging at DEBUG.

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Fix Version/s: 0.16.0
           Status: Patch Available  (was: Open)

Builds locally.

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: Patch Available  (was: In Progress)

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Status: In Progress  (was: Patch Available)

Patch is not being built by hudson.  Retrying.

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2485:
--------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed. Resolving.

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: 2485.patch, index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2485) [hbase] Make mapfile index interval configurable

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HADOOP-2485:
-----------------------------

    Assignee: stack

> [hbase] Make mapfile index interval configurable
> ------------------------------------------------
>
>                 Key: HADOOP-2485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: index.patch
>
>
> Default mapfile index interval is every 128 entries.  Basic tests show PerformanceEvaluation mapfile test random reading 100k records in 60plus seconds.  If index interval is set to 1 so we don't have to next around looking for our record, then 100k random reads take 7 seconds.  This is using local filesystem.  If I set it to 16, then takes 12 seconds. 
> Testing doing PerformanceEvaluation random reads against hbase, with interval set to 16, we run 50% faster (hdfs is in the picture).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.