You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Matt Kennedy (JIRA)" <ji...@apache.org> on 2011/02/25 01:37:38 UTC

[jira] Created: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat

Enable map reduce to use indexes for ColumnFamilyInputFormat
------------------------------------------------------------

                 Key: CASSANDRA-2245
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
             Project: Cassandra
          Issue Type: Improvement
          Components: Hadoop
    Affects Versions: 0.7.2
         Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
            Reporter: Matt Kennedy
            Priority: Minor
             Fix For: 0.8


Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on.  Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat

Posted by "Matt Kennedy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002930#comment-13002930 ] 

Matt Kennedy commented on CASSANDRA-2245:
-----------------------------------------

I've taken a crack at coding this up, but I'm not thrilled with the results. I agree with Brandon that CASSANDRA-1600 is the best way to deal with this issue.  The get_indexed_slices method doesn't offer the parameter for a key_range that makes this useful for a map reduce job.  I'm reviewing that discussion at the moment to see if there is a way to get a patch for something like this functionality out prior to 0.8 without breaking the thrift API.

> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7.2
>         Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
>            Reporter: Matt Kennedy
>            Priority: Minor
>              Labels: hadoop
>             Fix For: 0.8
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on.  Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999169#comment-12999169 ] 

Brandon Williams commented on CASSANDRA-2245:
---------------------------------------------

An easy way to solve this is CASSANDRA-1600, but we held off on it for 0.7 because we didn't want to break the thrift API.

> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7.2
>         Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
>            Reporter: Matt Kennedy
>            Priority: Minor
>              Labels: hadoop
>             Fix For: 0.8
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on.  Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat

Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mck SembWever resolved CASSANDRA-2245.
--------------------------------------

    Resolution: Duplicate

As far as i understand CASSANDRA-1125 covers your needs.

> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7.2
>         Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
>            Reporter: Matt Kennedy
>            Priority: Minor
>              Labels: hadoop
>             Fix For: 0.8
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on.  Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes for ColumnFamilyInputFormat

Posted by "Jesse Shieh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001779#comment-13001779 ] 

Jesse Shieh commented on CASSANDRA-2245:
----------------------------------------

+1 for this feature.  We have an index on a date column and want to be able to run a mapreduce just on the previous days data.  This would be very helpful for that use case.

> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.7.2
>         Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
>            Reporter: Matt Kennedy
>            Priority: Minor
>              Labels: hadoop
>             Fix For: 0.8
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on.  Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira