You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Vijay (JIRA)" <ji...@apache.org> on 2010/01/29 20:06:34 UTC

[jira] Created: (CASSANDRA-747) Need a additional method for Hadoop Range Queries.

Need a additional method for Hadoop Range Queries.
--------------------------------------------------

                 Key: CASSANDRA-747
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-747
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
    Affects Versions: 0.6
         Environment: Need to add additional method for Range queries from Hadoop. 
            Reporter: Vijay
            Priority: Minor
             Fix For: 0.6


Hadoop Integration might need the following.....

1) API to return the List of splits, given the number of splits. 
Using this tokens we cam span equal number of MR Jobs (Have a configuration in MR Job - This will be according to the complexity in processing), which will say how many map tasks per partition and span those process.

2) Start token to stream.... API
Input will be Range(String startKey, Token start, Token finish, int limit).... return will be 
    If Startwithkey is empty we will use the token1 as the starting point for the stream, else we will use startwithkey to specify the key to start with? Make sense?

So each MR jobs will get the range of data from the Cassandra and will do processing on it, it can also stream the data and doesn't need to get all of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-747) Need a additional method for Hadoop Range Queries.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-747.
--------------------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 0.6)

Let's keep this to CASSANDRA-342, please.

> Need a additional method for Hadoop Range Queries.
> --------------------------------------------------
>
>                 Key: CASSANDRA-747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-747
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.6
>         Environment: Need to add additional method for Range queries from Hadoop. 
>            Reporter: Vijay
>            Priority: Minor
>
> Hadoop Integration might need the following.....
> 1) API to return the List of splits, given the number of splits. 
> Using this tokens we cam span equal number of MR Jobs (Have a configuration in MR Job - This will be according to the complexity in processing), which will say how many map tasks per partition and span those process.
> 2) Start token to stream.... API
> Input will be Range(String startKey, Token start, Token finish, int limit).... return will be 
>     If Startwithkey is empty we will use the token1 as the starting point for the stream, else we will use startwithkey to specify the key to start with? Make sense?
> So each MR jobs will get the range of data from the Cassandra and will do processing on it, it can also stream the data and doesn't need to get all of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (CASSANDRA-747) Need a additional method for Hadoop Range Queries.

Posted by "Vijay (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay closed CASSANDRA-747.
---------------------------

    Assignee: Vijay

Closing this and merging it to #342

> Need a additional method for Hadoop Range Queries.
> --------------------------------------------------
>
>                 Key: CASSANDRA-747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-747
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.6
>         Environment: Need to add additional method for Range queries from Hadoop. 
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> Hadoop Integration might need the following.....
> 1) API to return the List of splits, given the number of splits. 
> Using this tokens we cam span equal number of MR Jobs (Have a configuration in MR Job - This will be according to the complexity in processing), which will say how many map tasks per partition and span those process.
> 2) Start token to stream.... API
> Input will be Range(String startKey, Token start, Token finish, int limit).... return will be 
>     If Startwithkey is empty we will use the token1 as the starting point for the stream, else we will use startwithkey to specify the key to start with? Make sense?
> So each MR jobs will get the range of data from the Cassandra and will do processing on it, it can also stream the data and doesn't need to get all of it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.