You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alvin UW <al...@gmail.com> on 2010/09/16 01:34:20 UTC

Build an index to for join query

Hello,

I am going to build an index to join two CFs.
First, we see this index as a CF/SCF. The difference is I don't materialise
it.
Assume we have two tables:
ID_Address(*Id*, address) ,  Name_ID(*name*, id)
Then,the index is: Name_Address(*name*, address)

When the application tries to query on Name_Address, the value of "name" is
given by the application.
I want to direct the read operation  to Name_ID to get "Id" value, then go
to ID_Address to
get the "address" value by the "Id" value. So far, I consider only the read
operation.
By this way, the join query is transparent to the user.

So I think I should find out which methods or classes are in charge of the
read operation in the above operation.
For example, the operation in cassandra CLI "get
Keyspace1.Standard2['jsmith']" calls exactly which methods
in the server side?

I noted CassandraServer is used to listen to clients, and there are some
methods such as get(), get_slice().
Is it the right place I can modify to implement my idea?

Thanks.

Alvin

Re: Build an index to for join query

Posted by Aaron Morton <aa...@thelastpickle.com>.

In the cassandra world the best approach is to create on CF with the name and address in it.  

Use a super CF with one super col for the user data and one super col for every address they have. Pull the entire row back every time you want to read the data. No need for joins.

Aaron


On 18 Sep 2010, at 08:56, Alvin UW <al...@gmail.com> wrote:

> Thanks Paul,
> 
> If we make a CF Name_Address(name, address) rather than an index, we have to maintain it, once any change happens in ID_Address(Id, address) ,  Name_ID(name, id). Besides, it also occupies some space.
> 
> In contrast, if Name_Address(name, address) is just an index, we can redirect the query to ID_Address(Id, address) ,  Name_ID(name, id) without the cost of maintenance.
> Does it make sense?
> 
> Alvin
>  
> 
> 2010/9/16 Rock, Paul <pa...@teamaol.com>
> Alvin - assuming I understand what you're after correctly, why not make a CF Name_Address(name, address). Modifying the Cassandra methods to do the "join" you describe seems like overkill to me...
> 
> -Paul
> 
> On Sep 15, 2010, at 7:34 PM, Alvin UW wrote:
> 
>> Hello,
>> 
>> I am going to build an index to join two CFs.
>> First, we see this index as a CF/SCF. The difference is I don't materialise it.
>> Assume we have two tables:
>> ID_Address(Id, address) ,  Name_ID(name, id)
>> Then,the index is: Name_Address(name, address)
>> 
>> When the application tries to query on Name_Address, the value of "name" is given by the application.
>> I want to direct the read operation  to Name_ID to get "Id" value, then go to ID_Address to 
>> get the "address" value by the "Id" value. So far, I consider only the read operation.
>> By this way, the join query is transparent to the user. 
>> 
>> So I think I should find out which methods or classes are in charge of the read operation in the above operation.
>> For example, the operation in cassandra CLI "get Keyspace1.Standard2['jsmith']" calls exactly which methods
>> in the server side?
>> 
>> I noted CassandraServer is used to listen to clients, and there are some methods such as get(), get_slice().
>> Is it the right place I can modify to implement my idea?  
>> 
>> Thanks.
>> 
>> Alvin
> 
>

Re: Build an index to for join query

Posted by Alvin UW <al...@gmail.com>.

Thanks Paul,

If we make a CF Name_Address(name, address) rather than an index, we have to
maintain it, once any change happens in ID_Address(*Id*, address) ,
Name_ID(*name*, id). Besides, it also occupies some space.

In contrast, if Name_Address(name, address) is just an index, we can
redirect the query to ID_Address(*Id*, address) ,  Name_ID(*name*, id)
without the cost of maintenance.
Does it make sense?

Alvin


2010/9/16 Rock, Paul <pa...@teamaol.com>

> Alvin - assuming I understand what you're after correctly, why not make a
> CF Name_Address(name, address). Modifying the Cassandra methods to do the
> "join" you describe seems like overkill to me...
>
> -Paul
>
> On Sep 15, 2010, at 7:34 PM, Alvin UW wrote:
>
> Hello,
>
> I am going to build an index to join two CFs.
> First, we see this index as a CF/SCF. The difference is I don't materialise
> it.
> Assume we have two tables:
> ID_Address(*Id*, address) ,  Name_ID(*name*, id)
> Then,the index is: Name_Address(*name*, address)
>
> When the application tries to query on Name_Address, the value of "name" is
> given by the application.
> I want to direct the read operation  to Name_ID to get "Id" value, then go
> to ID_Address to
> get the "address" value by the "Id" value. So far, I consider only the read
> operation.
> By this way, the join query is transparent to the user.
>
> So I think I should find out which methods or classes are in charge of the
> read operation in the above operation.
> For example, the operation in cassandra CLI "get
> Keyspace1.Standard2['jsmith']" calls exactly which methods
> in the server side?
>
> I noted CassandraServer is used to listen to clients, and there are some
> methods such as get(), get_slice().
> Is it the right place I can modify to implement my idea?
>
> Thanks.
>
> Alvin
>
>
>

Re: Build an index to for join query

Posted by "Rock, Paul" <pa...@teamaol.com>.

Alvin - assuming I understand what you're after correctly, why not make a CF Name_Address(name, address). Modifying the Cassandra methods to do the "join" you describe seems like overkill to me...

-Paul

On Sep 15, 2010, at 7:34 PM, Alvin UW wrote:

Hello,

I am going to build an index to join two CFs.
First, we see this index as a CF/SCF. The difference is I don't materialise it.
Assume we have two tables:
ID_Address(Id, address) ,  Name_ID(name, id)
Then,the index is: Name_Address(name, address)

When the application tries to query on Name_Address, the value of "name" is given by the application.
I want to direct the read operation  to Name_ID to get "Id" value, then go to ID_Address to
get the "address" value by the "Id" value. So far, I consider only the read operation.
By this way, the join query is transparent to the user.

So I think I should find out which methods or classes are in charge of the read operation in the above operation.
For example, the operation in cassandra CLI "get Keyspace1.Standard2['jsmith']" calls exactly which methods
in the server side?

I noted CassandraServer is used to listen to clients, and there are some methods such as get(), get_slice().
Is it the right place I can modify to implement my idea?

Thanks.

Alvin