You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by aaron morton <aa...@thelastpickle.com> on 2013/04/01 02:02:26 UTC
Re: MultiInput/MultiGet CF in MapReduce
> If I would use client.get_slice ( key). My rowkey is '20130314' from Index Table.
> Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
Calculate the MD5 hash of the key and find the token range that contains it.
This is what is used internally https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/FBUtilities.java#L239
Cheers
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 30/03/2013, at 10:45 AM, Alicia Leong <lc...@gmail.com> wrote:
> This is the current flow for ColumnFamilyInputFormat. Please correct me If I'm wrong
>
> 1) In ColumnFamilyInputFormat, Get all nodes token ranges using client.describe_ring
> 2) Get CfSplit using client.describe_splits_ex with the token range
> 2) new ColumnFamilySplit with start range, end range and endpoint
> 3) In ColumnFamilyRecordReader, will query client.get_range_slices with the start range & end range of the ColumnFamilySplit at endpoint (datanode)
>
>
> If I would use client.get_slice ( key). My rowkey is '20130314' from Index Table.
> Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
> Even though I manage to find out the Token Range & EndPoint.
> Is the available Thrift API, that I can pass the ( ByteBuffer key, KeyRange range ) Likes merge of client.get_slice & client.get_range_slices
>
>
> Thanks
>
>
>
> On Sat, Mar 30, 2013 at 7:53 AM, Edward Capriolo <ed...@gmail.com> wrote:
> You can use the output of describe_ring along with partitioner information to determine which nodes data lives on.
>
>
> On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong <lc...@gmail.com> wrote:
> Hi All
> I’m thinking to do in this way.
>
> 1) 1) get_slice ( YYYYMMDDHH ) from Index Table.
>
> 2) 2) With the returned list of ROWKEYs
>
> 3) 3) Pass it to multiget_slice ( keys …)
>
>
> But my questions is how to ensure ‘Data Locality’ ??
>
>
>
> On Tue, Mar 19, 2013 at 3:33 PM, aaron morton <aa...@thelastpickle.com> wrote:
> I would be looking at Hive or Pig, rather than writing the MapReduce.
>
> There is an example in the source cassandra distribution, or you can look at Data Stax Enterprise to start playing with Hive.
>
> Typically with hadoop queries you want to query a lot of data, if you are only querying a few rows consider writing the code in your favourite language.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/03/2013, at 1:29 PM, Alicia Leong <lc...@gmail.com> wrote:
>
>> Hi All
>>
>> I have 2 tables
>>
>> Data Table
>> -----------------
>> RowKey: 1
>> => (column=name, value=apple)
>> RowKey: 2
>> => (column=name, value=orange)
>> RowKey: 3
>> => (column=name, value=banana)
>> RowKey: 4
>> => (column=name, value=mango)
>>
>>
>> Index Table (YYYYMMDDHH)
>> ------------------------------------------------
>> RowKey: 2013030114
>> => (column=1, value=)
>> => (column=2, value=)
>> => (column=3, value=)
>> RowKey: 2013030115
>> => (column=4, value=)
>>
>>
>> I would like to know, how to implement below in MapReduce
>> 1) first query the Index Table by RowKey: 2013030114
>> 2) then pass the Index Table column names (1,2,3) to query the Data Table
>>
>> Thanks in advance.
>
>
>
>