You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Aklin_81 <as...@gmail.com> on 2011/02/06 09:14:45 UTC

Finding the intersection results of column sets of two rows

Hi all,

I want to procure the intersection of columns set of two rows (from 2
different column families).

To achieve the intersection results, Can I, first retrieve all
columns(around 300) from first row and just query by those column
names in the second row(which contains maximum 100 000 columns) ?

I am using the results during the write time & not before presentation
to the user, so latency wont be much concern while writing.

Is it the proper way to procure intersection results of two rows ?

Would love to hear your comments..


---------

Regards,
Asil

Re: Finding the intersection results of column sets of two rows

Posted by Aklin_81 <as...@gmail.com>.

Thank you so much Aaron !!

On Wed, Feb 9, 2011 at 2:11 AM, Aaron Morton <aa...@thelastpickle.com> wrote:
> Makes sense, use a get_slice() against the second row and pass in the column names. Should e fine.
>
> If you run into performance issues look at slice_buffer_size and column_index_size in the config.
>
> Aaron
>
>
> On 9/02/2011, at 5:16 AM, Aklin_81 <as...@gmail.com> wrote:
>
>> Amongst two rows, where I need to find the common columns. I will not
>> have more than 200 columns(in 99% cases) for the 1st row. But the 2nd
>> row where I need to find these columns may have even around a million
>> valueless columns.
>>
>> A point to note is:- These calculations are all done for **writing the
>> data to the database that has been collected from presentation layer**
>> & not while presentation of data.
>>
>> I am using the results of such intersection to find the rows(that are
>> pointed by names of common columns) that I should write to. The
>> calculations are done after a Post is submitted by a user, in a
>> discussions forum. Actually this is used to find out the mutual
>> connections in a group & write to the rows pointed by common columns.
>> 1st row represents the connection list of a user, which is not going
>> to be more than 100-250 columns for my case & 2nd row represents the
>> members of a group which may contain a million columns as I told.
>> I find the mutual connections in a group(by finding the common columns
>> in the above two rows) and then write to the rows of those users.
>>
>> Cant I run a batch query to ask for all columns that I picked up from
>> 1st row and want to ask in the 2nd row ??
>>
>> Is there any better way ?
>>
>> Asil
>>
>>
>>>
>>> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:
>>>
>>>> Thanks Aaron & Shaun,
>>>>
>>>> ******************************
>>>> I think my question might have been unclear to some of you. So I would
>>>> again explain my problem(& solution which I thought of) for the sake
>>>> of clarity:-
>>>>
>>>> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
>>>> contains like in hundreds of thousands columns. Both the columns sets
>>>> are all valueless. I need to just findout the **common column names**
>>>> in the two rows. **These two rows are known to me**. So what I plan to
>>>> do is, I just pick up all **columns (names)** of 1st row (60 -70
>>>> columns) and just ask for them in 2nd row, whatever column names I get
>>>> back is my result.
>>>> Would there be any problem with this solution ? This is how I am
>>>> expecting to get common column names.
>>>>
>>>> Please do not consider it as a JOIN case as it leads to unnecessary
>>>> confusions, I just need common column names from valueless columns in
>>>> the two rows.
>>>>
>>>> ********************************
>>>>
>>>> Aaron, actually the intersection data is very much context based. So
>>>> say if there are 10 million rows in CF A & 1 million in CF B, then
>>>> intersection data would be containing 10 million *1 million rows. This
>>>> would involve very huge & unaffordable amounts of denormalization.
>>>> And finding columns in client would require pulling unnecessary
>>>> columns like pulling 100,000 columns from a row of which only 60-70
>>>> are required .
>>>>
>>>> Shaun, I hope my above clarification has clarified things a bit. Yes,
>>>> the rows, of which I need to find common columns are known to me.
>>>>
>>>>
>>>> Thank you all,
>>>> Asil
>>>>
>>>>
>>>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>>> In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family.
>>>>>
>>>>> This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation).
>>>>>
>>>>> I have just found that the implementation for secondary indexes is not yet very close to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects you as you didn't say what you are joining on.
>>>>>
>>>>> -- Shaun
>>>>>
>>>>>
>>>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>>>>>
>>>>>> Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess.
>>>>>>
>>>>>> The other alternative is to pull back more data that you need and the intersection in code in the client.
>>>>>>
>>>>>>
>>>>>> Hope that helps.
>>>>>> Aaron
>>>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>>>>> of Cassandra, if it possible to achieve intersection result set of
>>>>>>> columns in two rows, by the way I spoke about.
>>>>>>>
>>>>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>>>>> common columns among two rows.
>>>>>>>
>>>>>>> Thanks for your comments..
>>>>>>>
>>>>>>> -Asil
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> If the amount of data is _that_ small, you'll have a much easier life with
>>>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>>>>>>>> you want to achieve.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> asil klin wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>>>>> different column families).
>>>>>>>>>>
>>>>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>>>>>
>>>>>>>>>> I am using the results during the write time & not before presentation
>>>>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>>>>>
>>>>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>>>>>
>>>>>>>>>> Would love to hear your comments..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Asil
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You can use multi-get when fetching lists of already know keys
>>>>>>>> optimize your round rip time.
>>>>>>>>
>>>>>
>>>>>
>>>
>>>
>

Re: Finding the intersection results of column sets of two rows

Posted by Aaron Morton <aa...@thelastpickle.com>.

Makes sense, use a get_slice() against the second row and pass in the column names. Should e fine.

If you run into performance issues look at slice_buffer_size and column_index_size in the config.

Aaron


On 9/02/2011, at 5:16 AM, Aklin_81 <as...@gmail.com> wrote:

> Amongst two rows, where I need to find the common columns. I will not
> have more than 200 columns(in 99% cases) for the 1st row. But the 2nd
> row where I need to find these columns may have even around a million
> valueless columns.
> 
> A point to note is:- These calculations are all done for **writing the
> data to the database that has been collected from presentation layer**
> & not while presentation of data.
> 
> I am using the results of such intersection to find the rows(that are
> pointed by names of common columns) that I should write to. The
> calculations are done after a Post is submitted by a user, in a
> discussions forum. Actually this is used to find out the mutual
> connections in a group & write to the rows pointed by common columns.
> 1st row represents the connection list of a user, which is not going
> to be more than 100-250 columns for my case & 2nd row represents the
> members of a group which may contain a million columns as I told.
> I find the mutual connections in a group(by finding the common columns
> in the above two rows) and then write to the rows of those users.
> 
> Cant I run a batch query to ask for all columns that I picked up from
> 1st row and want to ask in the 2nd row ??
> 
> Is there any better way ?
> 
> Asil
> 
> 
>> 
>> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:
>> 
>>> Thanks Aaron & Shaun,
>>> 
>>> ******************************
>>> I think my question might have been unclear to some of you. So I would
>>> again explain my problem(& solution which I thought of) for the sake
>>> of clarity:-
>>> 
>>> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
>>> contains like in hundreds of thousands columns. Both the columns sets
>>> are all valueless. I need to just findout the **common column names**
>>> in the two rows. **These two rows are known to me**. So what I plan to
>>> do is, I just pick up all **columns (names)** of 1st row (60 -70
>>> columns) and just ask for them in 2nd row, whatever column names I get
>>> back is my result.
>>> Would there be any problem with this solution ? This is how I am
>>> expecting to get common column names.
>>> 
>>> Please do not consider it as a JOIN case as it leads to unnecessary
>>> confusions, I just need common column names from valueless columns in
>>> the two rows.
>>> 
>>> ********************************
>>> 
>>> Aaron, actually the intersection data is very much context based. So
>>> say if there are 10 million rows in CF A & 1 million in CF B, then
>>> intersection data would be containing 10 million *1 million rows. This
>>> would involve very huge & unaffordable amounts of denormalization.
>>> And finding columns in client would require pulling unnecessary
>>> columns like pulling 100,000 columns from a row of which only 60-70
>>> are required .
>>> 
>>> Shaun, I hope my above clarification has clarified things a bit. Yes,
>>> the rows, of which I need to find common columns are known to me.
>>> 
>>> 
>>> Thank you all,
>>> Asil
>>> 
>>> 
>>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>>> In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family.
>>>> 
>>>> This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation).
>>>> 
>>>> I have just found that the implementation for secondary indexes is not yet very close to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects you as you didn't say what you are joining on.
>>>> 
>>>> -- Shaun
>>>> 
>>>> 
>>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>>>> 
>>>>> Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess.
>>>>> 
>>>>> The other alternative is to pull back more data that you need and the intersection in code in the client.
>>>>> 
>>>>> 
>>>>> Hope that helps.
>>>>> Aaron
>>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>>>> of Cassandra, if it possible to achieve intersection result set of
>>>>>> columns in two rows, by the way I spoke about.
>>>>>> 
>>>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>>>> common columns among two rows.
>>>>>> 
>>>>>> Thanks for your comments..
>>>>>> 
>>>>>> -Asil
>>>>>> 
>>>>>> 
>>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> If the amount of data is _that_ small, you'll have a much easier life with
>>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>>>>>>> you want to achieve.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> asil klin wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>>>> different column families).
>>>>>>>>> 
>>>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>>>> 
>>>>>>>>> I am using the results during the write time & not before presentation
>>>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>>>> 
>>>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>>>> 
>>>>>>>>> Would love to hear your comments..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ---------
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Asil
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>>>>>>> 
>>>>>>> 
>>>>>>> You can use multi-get when fetching lists of already know keys
>>>>>>> optimize your round rip time.
>>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Finding the intersection results of column sets of two rows

Posted by Aklin_81 <as...@gmail.com>.

Amongst two rows, where I need to find the common columns. I will not
have more than 200 columns(in 99% cases) for the 1st row. But the 2nd
row where I need to find these columns may have even around a million
valueless columns.

A point to note is:- These calculations are all done for **writing the
data to the database that has been collected from presentation layer**
& not while presentation of data.

I am using the results of such intersection to find the rows(that are
pointed by names of common columns) that I should write to. The
calculations are done after a Post is submitted by a user, in a
discussions forum. Actually this is used to find out the mutual
connections in a group & write to the rows pointed by common columns.
1st row represents the connection list of a user, which is not going
to be more than 100-250 columns for my case & 2nd row represents the
members of a group which may contain a million columns as I told.
I find the mutual connections in a group(by finding the common columns
in the above two rows) and then write to the rows of those users.

Cant I run a batch query to ask for all columns that I picked up from
1st row and want to ask in the 2nd row ??

Is there any better way ?

Asil


>
> On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:
>
>> Thanks Aaron & Shaun,
>>
>> ******************************
>> I think my question might have been unclear to some of you. So I would
>> again explain my problem(& solution which I thought of) for the sake
>> of clarity:-
>>
>> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
>> contains like in hundreds of thousands columns. Both the columns sets
>> are all valueless. I need to just findout the **common column names**
>> in the two rows. **These two rows are known to me**. So what I plan to
>> do is, I just pick up all **columns (names)** of 1st row (60 -70
>> columns) and just ask for them in 2nd row, whatever column names I get
>> back is my result.
>> Would there be any problem with this solution ? This is how I am
>> expecting to get common column names.
>>
>> Please do not consider it as a JOIN case as it leads to unnecessary
>> confusions, I just need common column names from valueless columns in
>> the two rows.
>>
>> ********************************
>>
>> Aaron, actually the intersection data is very much context based. So
>> say if there are 10 million rows in CF A & 1 million in CF B, then
>> intersection data would be containing 10 million *1 million rows. This
>> would involve very huge & unaffordable amounts of denormalization.
>> And finding columns in client would require pulling unnecessary
>> columns like pulling 100,000 columns from a row of which only 60-70
>> are required .
>>
>> Shaun, I hope my above clarification has clarified things a bit. Yes,
>> the rows, of which I need to find common columns are known to me.
>>
>>
>> Thank you all,
>> Asil
>>
>>
>> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>>> In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family.
>>>
>>> This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation).
>>>
>>> I have just found that the implementation for secondary indexes is not yet very close to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects you as you didn't say what you are joining on.
>>>
>>> -- Shaun
>>>
>>>
>>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>>>
>>>> Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess.
>>>>
>>>> The other alternative is to pull back more data that you need and the intersection in code in the client.
>>>>
>>>>
>>>> Hope that helps.
>>>> Aaron
>>>> On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>>> of Cassandra, if it possible to achieve intersection result set of
>>>>> columns in two rows, by the way I spoke about.
>>>>>
>>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>>> common columns among two rows.
>>>>>
>>>>> Thanks for your comments..
>>>>>
>>>>> -Asil
>>>>>
>>>>>
>>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> If the amount of data is _that_ small, you'll have a much easier life with
>>>>>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>>>>>> you want to achieve.
>>>>>>>
>>>>>>>
>>>>>>> asil klin wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>>> different column families).
>>>>>>>>
>>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>>>
>>>>>>>> I am using the results during the write time & not before presentation
>>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>>>
>>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>>>
>>>>>>>> Would love to hear your comments..
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Asil
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>>>>>>
>>>>>>
>>>>>> You can use multi-get when fetching lists of already know keys
>>>>>> optimize your round rip time.
>>>>>>
>>>
>>>
>
>

Re: Finding the intersection results of column sets of two rows

Posted by Shaun Cutts <sh...@cuttshome.net>.

Ah -- ok.

So (I'm also pretty new to Cassandra, but) I believe that your solution works... just as far as I know you might have to query for those 60-70 columns individually. If you wanted to optimize, you could create a bloom filter for the columns in row 2, and first get it, and then only query for the ones you get a positive response for. If you play with the structure (I seem to recall some two-level hashing schemes... or cascading bloom filters) you can probably eliminate the query for the columns altogether, at the cost of both space and program complexity.

You could also store the second CF as a row for each original row / column -- so 200K times more rows, but each one just boolean. As far as I understand these queries might go faster as you can take better advantage of cassandra parallelization -- in the first case you have some small constant number of replicas of that one column; in the second they will be distributed over the cluster (assuming RandomPartitioner).

-- Shaun

On Feb 7, 2011, at 12:30 AM, Aklin_81 wrote:

> Thanks Aaron & Shaun,
> 
> ******************************
> I think my question might have been unclear to some of you. So I would
> again explain my problem(& solution which I thought of) for the sake
> of clarity:-
> 
> Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
> contains like in hundreds of thousands columns. Both the columns sets
> are all valueless. I need to just findout the **common column names**
> in the two rows. **These two rows are known to me**. So what I plan to
> do is, I just pick up all **columns (names)** of 1st row (60 -70
> columns) and just ask for them in 2nd row, whatever column names I get
> back is my result.
> Would there be any problem with this solution ? This is how I am
> expecting to get common column names.
> 
> Please do not consider it as a JOIN case as it leads to unnecessary
> confusions, I just need common column names from valueless columns in
> the two rows.
> 
> ********************************
> 
> Aaron, actually the intersection data is very much context based. So
> say if there are 10 million rows in CF A & 1 million in CF B, then
> intersection data would be containing 10 million *1 million rows. This
> would involve very huge & unaffordable amounts of denormalization.
> And finding columns in client would require pulling unnecessary
> columns like pulling 100,000 columns from a row of which only 60-70
> are required .
> 
> Shaun, I hope my above clarification has clarified things a bit. Yes,
> the rows, of which I need to find common columns are known to me.
> 
> 
> Thank you all,
> Asil
> 
> 
> On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
>> In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family.
>> 
>> This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation).
>> 
>> I have just found that the implementation for secondary indexes is not yet very close to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects you as you didn't say what you are joining on.
>> 
>> -- Shaun
>> 
>> 
>> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>> 
>>> Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess.
>>> 
>>> The other alternative is to pull back more data that you need and the intersection in code in the client.
>>> 
>>> 
>>> Hope that helps.
>>> Aaron
>>> On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> @buddhasystem : yes that's well known solution. But obviously when
>>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>>> of Cassandra, if it possible to achieve intersection result set of
>>>> columns in two rows, by the way I spoke about.
>>>> 
>>>> @Edward: yes that I know but how does that fit here for obtaining the
>>>> common columns among two rows.
>>>> 
>>>> Thanks for your comments..
>>>> 
>>>> -Asil
>>>> 
>>>> 
>>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> If the amount of data is _that_ small, you'll have a much easier life with
>>>>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>>>>> you want to achieve.
>>>>>> 
>>>>>> 
>>>>>> asil klin wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>>> different column families).
>>>>>>> 
>>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>>> columns(around 300) from first row and just query by those column
>>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>> 
>>>>>>> I am using the results during the write time & not before presentation
>>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>> 
>>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>> 
>>>>>>> Would love to hear your comments..
>>>>>>> 
>>>>>>> 
>>>>>>> ---------
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Asil
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>>>>> 
>>>>> 
>>>>> You can use multi-get when fetching lists of already know keys
>>>>> optimize your round rip time.
>>>>> 
>> 
>>

Re: Finding the intersection results of column sets of two rows

Posted by Aklin_81 <as...@gmail.com>.

Thanks Aaron & Shaun,

******************************
I think my question might have been unclear to some of you. So I would
again explain my problem(& solution which I thought of) for the sake
of clarity:-

Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
contains like in hundreds of thousands columns. Both the columns sets
are all valueless. I need to just findout the **common column names**
in the two rows. **These two rows are known to me**. So what I plan to
do is, I just pick up all **columns (names)** of 1st row (60 -70
columns) and just ask for them in 2nd row, whatever column names I get
back is my result.
Would there be any problem with this solution ? This is how I am
expecting to get common column names.

Please do not consider it as a JOIN case as it leads to unnecessary
confusions, I just need common column names from valueless columns in
the two rows.

********************************

Aaron, actually the intersection data is very much context based. So
say if there are 10 million rows in CF A & 1 million in CF B, then
intersection data would be containing 10 million *1 million rows. This
would involve very huge & unaffordable amounts of denormalization.
And finding columns in client would require pulling unnecessary
columns like pulling 100,000 columns from a row of which only 60-70
are required .

Shaun, I hope my above clarification has clarified things a bit. Yes,
the rows, of which I need to find common columns are known to me.

Thank you all,
Asil

On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote:
> In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family.
>
> This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation).
>
> I have just found that the implementation for secondary indexes is not yet very close to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects you as you didn't say what you are joining on.
>
> -- Shaun
>
>
> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>
>> Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess.
>>
>> The other alternative is to pull back more data that you need and the intersection in code in the client.
>>
>>
>> Hope that helps.
>> Aaron
>> On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> @buddhasystem : yes that's well known solution. But obviously when
>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>> of Cassandra, if it possible to achieve intersection result set of
>>> columns in two rows, by the way I spoke about.
>>>
>>> @Edward: yes that I know but how does that fit here for obtaining the
>>> common columns among two rows.
>>>
>>> Thanks for your comments..
>>>
>>> -Asil
>>>
>>>
>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> If the amount of data is _that_ small, you'll have a much easier life with
>>>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>>>> you want to achieve.
>>>>>
>>>>>
>>>>> asil klin wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>>> different column families).
>>>>>>
>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>> columns(around 300) from first row and just query by those column
>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>>
>>>>>> I am using the results during the write time & not before presentation
>>>>>> to the user, so latency wont be much concern while writing.
>>>>>>
>>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>>>
>>>>>> Would love to hear your comments..
>>>>>>
>>>>>>
>>>>>> ---------
>>>>>>
>>>>>> Regards,
>>>>>> Asil
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>>>>
>>>>
>>>> You can use multi-get when fetching lists of already know keys
>>>> optimize your round rip time.
>>>>
>
>

Re: Finding the intersection results of column sets of two rows

Posted by Shaun Cutts <sh...@cuttshome.net>.

In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family.

This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation). 

I have just found that the implementation for secondary indexes is not yet very close to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects you as you didn't say what you are joining on.

-- Shaun


On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:

> Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess. 
> 
> The other alternative is to pull back more data that you need and the intersection in code in the client. 
> 
> 
> Hope that helps.
> Aaron
> On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:
> 
>> Hi,
>> 
>> @buddhasystem : yes that's well known solution. But obviously when
>> mysql couldnt satisfy my needs, I am here. My question is in context
>> of Cassandra, if it possible to achieve intersection result set of
>> columns in two rows, by the way I spoke about.
>> 
>> @Edward: yes that I know but how does that fit here for obtaining the
>> common columns among two rows.
>> 
>> Thanks for your comments..
>> 
>> -Asil
>> 
>> 
>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> If the amount of data is _that_ small, you'll have a much easier life with
>>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>>> you want to achieve.
>>>> 
>>>> 
>>>> asil klin wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I want to procure the intersection of columns set of two rows (from 2
>>>>> different column families).
>>>>> 
>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>> columns(around 300) from first row and just query by those column
>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>> 
>>>>> I am using the results during the write time & not before presentation
>>>>> to the user, so latency wont be much concern while writing.
>>>>> 
>>>>> Is it the proper way to procure intersection results of two rows ?
>>>>> 
>>>>> Would love to hear your comments..
>>>>> 
>>>>> 
>>>>> ---------
>>>>> 
>>>>> Regards,
>>>>> Asil
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>>> 
>>> 
>>> You can use multi-get when fetching lists of already know keys
>>> optimize your round rip time.
>>>

Re: Finding the intersection results of column sets of two rows

Posted by Aaron Morton <aa...@thelastpickle.com>.

Is it possible for you to dernormalise and write all the intersection values? Will depend on how many I guess. 

The other alternative is to pull back more data that you need and the intersection in code in the client. 


Hope that helps.
Aaron
On 7/02/2011, at 7:11 AM, Aklin_81 <as...@gmail.com> wrote:

> Hi,
> 
> @buddhasystem : yes that's well known solution. But obviously when
> mysql couldnt satisfy my needs, I am here. My question is in context
> of Cassandra, if it possible to achieve intersection result set of
> columns in two rows, by the way I spoke about.
> 
> @Edward: yes that I know but how does that fit here for obtaining the
> common columns among two rows.
> 
> Thanks for your comments..
> 
> -Asil
> 
> 
> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>> 
>>> Hello,
>>> 
>>> If the amount of data is _that_ small, you'll have a much easier life with
>>> MySQL, which supports the "join" procedure -- because that's exactly what
>>> you want to achieve.
>>> 
>>> 
>>> asil klin wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I want to procure the intersection of columns set of two rows (from 2
>>>> different column families).
>>>> 
>>>> To achieve the intersection results, Can I, first retrieve all
>>>> columns(around 300) from first row and just query by those column
>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>> 
>>>> I am using the results during the write time & not before presentation
>>>> to the user, so latency wont be much concern while writing.
>>>> 
>>>> Is it the proper way to procure intersection results of two rows ?
>>>> 
>>>> Would love to hear your comments..
>>>> 
>>>> 
>>>> ---------
>>>> 
>>>> Regards,
>>>> Asil
>>>> 
>>>> 
>>> 
>>> --
>>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>> 
>> 
>> You can use multi-get when fetching lists of already know keys
>> optimize your round rip time.
>>

Re: Finding the intersection results of column sets of two rows

Posted by Aklin_81 <as...@gmail.com>.

Hi,

@buddhasystem : yes that's well known solution. But obviously when
mysql couldnt satisfy my needs, I am here. My question is in context
of Cassandra, if it possible to achieve intersection result set of
columns in two rows, by the way I spoke about.

@Edward: yes that I know but how does that fit here for obtaining the
common columns among two rows.

Thanks for your comments..

-Asil


On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <ed...@gmail.com> wrote:
> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>>
>> Hello,
>>
>> If the amount of data is _that_ small, you'll have a much easier life with
>> MySQL, which supports the "join" procedure -- because that's exactly what
>> you want to achieve.
>>
>>
>> asil klin wrote:
>>>
>>> Hi all,
>>>
>>> I want to procure the intersection of columns set of two rows (from 2
>>> different column families).
>>>
>>> To achieve the intersection results, Can I, first retrieve all
>>> columns(around 300) from first row and just query by those column
>>> names in the second row(which contains maximum 100 000 columns) ?
>>>
>>> I am using the results during the write time & not before presentation
>>> to the user, so latency wont be much concern while writing.
>>>
>>> Is it the proper way to procure intersection results of two rows ?
>>>
>>> Would love to hear your comments..
>>>
>>>
>>> ---------
>>>
>>> Regards,
>>> Asil
>>>
>>>
>>
>> --
>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>>
>
> You can use multi-get when fetching lists of already know keys
> optimize your round rip time.
>

Re: Finding the intersection results of column sets of two rows

Posted by Edward Capriolo <ed...@gmail.com>.

On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <po...@bnl.gov> wrote:
>
> Hello,
>
> If the amount of data is _that_ small, you'll have a much easier life with
> MySQL, which supports the "join" procedure -- because that's exactly what
> you want to achieve.
>
>
> asil klin wrote:
>>
>> Hi all,
>>
>> I want to procure the intersection of columns set of two rows (from 2
>> different column families).
>>
>> To achieve the intersection results, Can I, first retrieve all
>> columns(around 300) from first row and just query by those column
>> names in the second row(which contains maximum 100 000 columns) ?
>>
>> I am using the results during the write time & not before presentation
>> to the user, so latency wont be much concern while writing.
>>
>> Is it the proper way to procure intersection results of two rows ?
>>
>> Would love to hear your comments..
>>
>>
>> ---------
>>
>> Regards,
>> Asil
>>
>>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

You can use multi-get when fetching lists of already know keys
optimize your round rip time.

Re: Finding the intersection results of column sets of two rows

Posted by buddhasystem <po...@bnl.gov>.

Hello,

If the amount of data is _that_ small, you'll have a much easier life with
MySQL, which supports the "join" procedure -- because that's exactly what
you want to achieve.


asil klin wrote:
> 
> Hi all,
> 
> I want to procure the intersection of columns set of two rows (from 2
> different column families).
> 
> To achieve the intersection results, Can I, first retrieve all
> columns(around 300) from first row and just query by those column
> names in the second row(which contains maximum 100 000 columns) ?
> 
> I am using the results during the write time & not before presentation
> to the user, so latency wont be much concern while writing.
> 
> Is it the proper way to procure intersection results of two rows ?
> 
> Would love to hear your comments..
> 
> 
> ---------
> 
> Regards,
> Asil
> 
> 

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Finding the intersection results of column sets of two rows

Posted by Asil <as...@gmail.com>.

I think my question might have been unclear to some of you. So I would
again explain my problem(& solution which I thought of) for the sake
of clarity:-

Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
contains like in hundreds of thousands columns. Both the columns sets
are all valueless. I need to just findout the **common column names**
in the two rows. **These two rows are known to me**. So what I plan to
do is, I just pick up all **columns (names)** of 1st row (60 -70
columns) and just ask for them in 2nd row, whatever column names I get
back is my result.
Would there be any problem with this solution ? This is how I am
expecting to get common column names.