You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (Created) (JIRA)" <ji...@apache.org> on 2012/03/23 23:29:29 UTC
[jira] [Created] (HBASE-5626) Compactions simulator tool for
proofing algorithms
Compactions simulator tool for proofing algorithms
--------------------------------------------------
Key: HBASE-5626
URL: https://issues.apache.org/jira/browse/HBASE-5626
Project: HBase
Issue Type: Task
Reporter: stack
Priority: Minor
A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for
proofing algorithms
Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238838#comment-13238838 ]
Nicolas Spiegelberg commented on HBASE-5626:
--------------------------------------------
How is this different from the compaction simulation python script? The unit of measurement should be a flush, since we flush after a certain memstore memory size, regardless of flow rate or KV length.
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
> Key: HBASE-5626
> URL: https://issues.apache.org/jira/browse/HBASE-5626
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Priority: Minor
> Labels: noob
>
> A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for
proofing algorithms
Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238853#comment-13238853 ]
stack commented on HBASE-5626:
------------------------------
Where is the python simulation script? Is it uploaded anywhere? (Pardon me if I missed it)
Simulator needs to also factor in splitting.
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
> Key: HBASE-5626
> URL: https://issues.apache.org/jira/browse/HBASE-5626
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Priority: Minor
> Labels: noob
>
> A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5626) Compactions simulator tool for
proofing algorithms
Posted by "Nicolas Spiegelberg (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Spiegelberg updated HBASE-5626:
---------------------------------------
Attachment: cf_compact.py
Attached the current python script that I use to emulate compactions given different params.
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
> Key: HBASE-5626
> URL: https://issues.apache.org/jira/browse/HBASE-5626
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Priority: Minor
> Labels: noob
> Attachments: cf_compact.py
>
>
> A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for
proofing algorithms
Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238907#comment-13238907 ]
Nicolas Spiegelberg commented on HBASE-5626:
--------------------------------------------
A little more explanation.
Basic Concept:
We wish to model the amount of compaction IO and file dispersion. The unit of measurement for compactions is a flush. This is because a flush is always 64MB (or whatever you configure) regardless of other properties about the CF/KV. Column families might trigger flushes at different intervals, but they usually flush a consistent amount of data. You can understand the behavior of a compaction algorithm based upon how it behaves over X amount of flushes. Does this test make a lot of assumptions and simplifications? Yes!
Inputs:
1. ratio = compaction.ratio between files. (same as the HBase config)
2. min.files = minimum count of files that must be selected for a compaction to occur (same as HBase config)
3. duplication = percentage of KVs within a file that are mutations and will be deduped on compaction (0 <= DUPLICATION <= 1)
4. iterations = number of flushes to simulate
Output:
1. The StoreFile dispersion after every flush (and, possibly, compaction triggered by that flush)
2. The average storefile count over <iterations> flushes
3. The amount of IO consumed by compactions after those <iterations> flushes.
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
> Key: HBASE-5626
> URL: https://issues.apache.org/jira/browse/HBASE-5626
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Priority: Minor
> Labels: noob
> Attachments: cf_compact.py
>
>
> A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5626) Compactions simulator tool for
proofing algorithms
Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238908#comment-13238908 ]
stack commented on HBASE-5626:
------------------------------
Nice. Let me take a looksee...
> Compactions simulator tool for proofing algorithms
> --------------------------------------------------
>
> Key: HBASE-5626
> URL: https://issues.apache.org/jira/browse/HBASE-5626
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Priority: Minor
> Labels: noob
> Attachments: cf_compact.py
>
>
> A tool to run compaction simulations would be a nice to have. We could use it to see how well an algo ran under different circumstances loaded w/ different value types with different rates of flushes and splits, etc. HBASE-2462 had one (see in patch). Or we could try doing it using something like this: http://en.wikipedia.org/wiki/Discrete_event_simulation
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira