You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matt Kennedy (JIRA)" <ji...@apache.org> on 2011/02/25 01:37:38 UTC
[jira] Created: (CASSANDRA-2245) Enable map reduce to use indexes
for ColumnFamilyInputFormat
Enable map reduce to use indexes for ColumnFamilyInputFormat
------------------------------------------------------------
Key: CASSANDRA-2245
URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
Project: Cassandra
Issue Type: Improvement
Components: Hadoop
Affects Versions: 0.7.2
Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
Reporter: Matt Kennedy
Priority: Minor
Fix For: 0.8
Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes
for ColumnFamilyInputFormat
Posted by "Matt Kennedy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002930#comment-13002930 ]
Matt Kennedy commented on CASSANDRA-2245:
-----------------------------------------
I've taken a crack at coding this up, but I'm not thrilled with the results. I agree with Brandon that CASSANDRA-1600 is the best way to deal with this issue. The get_indexed_slices method doesn't offer the parameter for a key_range that makes this useful for a map reduce job. I'm reviewing that discussion at the moment to see if there is a way to get a patch for something like this functionality out prior to 0.8 without breaking the thrift API.
> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
> Key: CASSANDRA-2245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Affects Versions: 0.7.2
> Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
> Reporter: Matt Kennedy
> Priority: Minor
> Labels: hadoop
> Fix For: 0.8
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes
for ColumnFamilyInputFormat
Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999169#comment-12999169 ]
Brandon Williams commented on CASSANDRA-2245:
---------------------------------------------
An easy way to solve this is CASSANDRA-1600, but we held off on it for 0.7 because we didn't want to break the thrift API.
> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
> Key: CASSANDRA-2245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Affects Versions: 0.7.2
> Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
> Reporter: Matt Kennedy
> Priority: Minor
> Labels: hadoop
> Fix For: 0.8
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (CASSANDRA-2245) Enable map reduce to use indexes
for ColumnFamilyInputFormat
Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mck SembWever resolved CASSANDRA-2245.
--------------------------------------
Resolution: Duplicate
As far as i understand CASSANDRA-1125 covers your needs.
> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
> Key: CASSANDRA-2245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Affects Versions: 0.7.2
> Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
> Reporter: Matt Kennedy
> Priority: Minor
> Labels: hadoop
> Fix For: 0.8
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2245) Enable map reduce to use indexes
for ColumnFamilyInputFormat
Posted by "Jesse Shieh (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001779#comment-13001779 ]
Jesse Shieh commented on CASSANDRA-2245:
----------------------------------------
+1 for this feature. We have an index on a date column and want to be able to run a mapreduce just on the previous days data. This would be very helpful for that use case.
> Enable map reduce to use indexes for ColumnFamilyInputFormat
> ------------------------------------------------------------
>
> Key: CASSANDRA-2245
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2245
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Affects Versions: 0.7.2
> Environment: Cassandra 0.7 or later and Hadoop 0.20.1 or later
> Reporter: Matt Kennedy
> Priority: Minor
> Labels: hadoop
> Fix For: 0.8
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Enable the ability to run a MapReduce job that takes a value in an indexed column as a parameter, and use that to select the data that the MapReduce job operates on. Right now, it looks like this isn't possible because org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data with get_range_slices, not get_indexed_slices.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira