You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Piotr Kołaczkowski (JIRA)" <ji...@apache.org> on 2012/11/22 10:08:58 UTC
[jira] [Created] (CASSANDRA-4983) Improve range wrap-around in
CFIF: CFIF shouldn't produce input splits of very tiny size
Piotr Kołaczkowski created CASSANDRA-4983:
---------------------------------------------
Summary: Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size
Key: CASSANDRA-4983
URL: https://issues.apache.org/jira/browse/CASSANDRA-4983
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.1.6
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
Priority: Minor
Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:
* One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
* The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4983) Improve range wrap-around in
CFIF: CFIF shouldn't produce input splits of very tiny size
Posted by "Piotr Kołaczkowski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Kołaczkowski updated CASSANDRA-4983:
------------------------------------------
Description:
Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:
* One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
* The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
* Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).
was:
Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:
* One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
* The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
> Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-4983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4983
> Project: Cassandra
> Issue Type: Improvement
> Affects Versions: 1.1.6
> Reporter: Piotr Kołaczkowski
> Assignee: Piotr Kołaczkowski
> Priority: Minor
>
> Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:
> * One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
> * The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
> * Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4983) Improve range wrap-around in
CFIF: CFIF shouldn't produce input splits of very tiny size
Posted by "Piotr Kołaczkowski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Kołaczkowski updated CASSANDRA-4983:
------------------------------------------
Attachment: 0001-CASSANDRA-4983-CFRR-able-to-iterate-over-more-than-o.patch
> Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-4983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4983
> Project: Cassandra
> Issue Type: Improvement
> Affects Versions: 1.1.6
> Reporter: Piotr Kołaczkowski
> Assignee: Piotr Kołaczkowski
> Priority: Minor
> Attachments: 0001-CASSANDRA-4983-CFRR-able-to-iterate-over-more-than-o.patch
>
>
> Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it simplifies CFRR implementation, this approach has several minor downsides:
> * One of the splits can be extremely small. One of our (picky) customers suspected there must be a bug, because one of his map tasks executed in 1 second, while all the rest executed in minutes. Also having a very small task is wasting resources - more resources go to launching the task than doing any real work.
> * The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size). The number of map tasks is always >= 2. This is confusing customers.
> * Progress reporting for the divided split parts is inaccurate - even if the splits are similar in size, the progress bar goes to about 50% and then immediately to 100%, because it is impossible to estimate their size properly (the size estimation is done before removing wrap-around).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira