You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (Created) (JIRA)" <ji...@apache.org> on 2012/03/23 23:29:29 UTC

[jira] [Created] (HBASE-5626) Compactions simulator tool for proofing algorithms

Compactions simulator tool for proofing algorithms
--------------------------------------------------

                 Key: HBASE-5626
                 URL: https://issues.apache.org/jira/browse/HBASE-5626
             Project: HBase
          Issue Type: Task
            Reporter: stack
            Priority: Minor


A tool to run compaction simulations would be a nice to have.   We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch).  Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238838#comment-13238838 ] 

Nicolas Spiegelberg commented on HBASE-5626:
--------------------------------------------

How is this different from the compaction simulation python script?  The unit of measurement should be a flush, since we flush after a certain memstore memory size, regardless of flow rate or KV length.
                
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
>                 Key: HBASE-5626
>                 URL: https://issues.apache.org/jira/browse/HBASE-5626
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>            Priority: Minor
>              Labels: noob
>
> A tool to run compaction simulations would be a nice to have.   We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch).  Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238853#comment-13238853 ] 

stack commented on HBASE-5626:
------------------------------

Where is the python simulation script?  Is it uploaded anywhere?  (Pardon me if I missed it)

Simulator needs to also factor in splitting.
                
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
>                 Key: HBASE-5626
>                 URL: https://issues.apache.org/jira/browse/HBASE-5626
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>            Priority: Minor
>              Labels: noob
>
> A tool to run compaction simulations would be a nice to have.   We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch).  Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5626) Compactions simulator tool for proofing algorithms

Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-5626:
---------------------------------------

    Attachment: cf_compact.py

Attached the current python script that I use to emulate compactions given different params.
                
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
>                 Key: HBASE-5626
>                 URL: https://issues.apache.org/jira/browse/HBASE-5626
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>            Priority: Minor
>              Labels: noob
>         Attachments: cf_compact.py
>
>
> A tool to run compaction simulations would be a nice to have.   We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch).  Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238907#comment-13238907 ] 

Nicolas Spiegelberg commented on HBASE-5626:
--------------------------------------------

A little more explanation.  

Basic Concept:
We wish to model the amount of compaction IO and file dispersion.  The unit of measurement for compactions is a flush.  This is because a flush is always 64MB (or whatever you configure) regardless of other properties about the CF/KV.  Column families might trigger flushes at different intervals, but they usually flush a consistent amount of data.  You can understand the behavior of a compaction algorithm based upon how it behaves over X amount of flushes.  Does this test make a lot of assumptions and simplifications?  Yes!

Inputs:
1. ratio = compaction.ratio between files.  (same as the HBase config)
2. min.files = minimum count of files that must be selected for a compaction to occur (same as HBase config)
3. duplication = percentage of KVs within a file that are mutations and will be deduped on compaction (0 <= DUPLICATION <= 1)
4. iterations = number of flushes to simulate

Output:
1. The StoreFile dispersion after every flush (and, possibly, compaction triggered by that flush)
2. The average storefile count over <iterations> flushes
3. The amount of IO consumed by compactions after those <iterations> flushes.
                
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
>                 Key: HBASE-5626
>                 URL: https://issues.apache.org/jira/browse/HBASE-5626
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>            Priority: Minor
>              Labels: noob
>         Attachments: cf_compact.py
>
>
> A tool to run compaction simulations would be a nice to have.   We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch).  Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238908#comment-13238908 ] 

stack commented on HBASE-5626:
------------------------------

Nice. Let me take a looksee...
                
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
>                 Key: HBASE-5626
>                 URL: https://issues.apache.org/jira/browse/HBASE-5626
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>            Priority: Minor
>              Labels: noob
>         Attachments: cf_compact.py
>
>
> A tool to run compaction simulations would be a nice to have.   We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch).  Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira