You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Aditya <ad...@gmail.com> on 2011/12/27 07:17:05 UTC

Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

I need to store data of all activities by user's followies in single row. I
am trying to do that making use of composite column names in a single user
specific row named 'rowX'.

On any activity by a user's followie on an item, a column is stored in
'rowX'. The column has a composite type column name made up of
itemId+userId (which makes it unique col. name) in rowX. (& column value
contains the activity data related to that item by that followie)


Now I want to retrieve activity by all users on a list of items. So I need
to retrieve all composite columns with composite's first component matching
the itemId. Is it possible to do such a query to Cassandra ? I am using
Hector.

Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

Posted by Martin Arrowsmith <ar...@gmail.com>.

I believe this calls for Cassanda Cookbook 2nd edition :)

On Wed, Dec 28, 2011 at 10:26 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Super columns have the same fundamental problem and perform worse in
> general. So switching from composites to super columns is NEVER a good idea.
>
>
> On Wed, Dec 28, 2011 at 1:19 PM, Aditya <ad...@gmail.com> wrote:
>
>> Since I have around 20 items to query, I guess making 20 queries to
>> retrieve activities by all followies on all of those 20 columns would too
>> inefficient, so to take the advantage of more efficient queries, are
>> supercolumns recommended for this case ? Anyways, in case I use
>> supercolumns, I need to retrieve the entire supercolumn at any point of
>> time & I am writing subcolumn(s) to the supercolumn at different times not
>> at once.
>>
>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> You need to execute one get slice operation for each item id or if the
>>> row is not large , you can try one large get slice on the entire row and
>>> deal with the results client side.
>>>
>>> If you try method 1 When doing slices on composites you can set the
>>> start inclusive or exclusive values to get only the column you want and not
>>> some extra columns up to slice range size.
>>>
>>>
>>> On Tuesday, December 27, 2011, Aditya <ad...@gmail.com> wrote:
>>> > I need to store data of all activities by user's followies in single
>>> row. I am trying to do that making use of composite column names in a
>>> single user specific row named 'rowX'.
>>> > On any activity by a user's followie on an item, a column is stored in
>>> 'rowX'. The column has a composite type column name made up of
>>> itemId+userId (which makes it unique col. name) in rowX. (& column value
>>> contains the activity data related to that item by that followie)
>>> >
>>> > Now I want to retrieve activity by all users on a list of items. So I
>>> need to retrieve all composite columns with composite's first component
>>> matching the itemId. Is it possible to do such a query to Cassandra ? I am
>>> using Hector.
>>>
>>
>>
>

Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

Posted by Aditya <ad...@gmail.com>.

Also point worth noticing is that there might be at max 8-10  subcolumns
per supercolumn.
I need to write a subcolumn at a time( but always read entire supercolumn
at any time).

On Fri, Dec 30, 2011 at 12:20 AM, Aditya <ad...@gmail.com> wrote:

> @Edward: Perhaps you missed to notice that I need to always retrieve 'all
> columns' under the supercolumn at any time.. and as per my query
> requirements if I use composite columns instead of supercolumns then it is
> impossible to do wildcard queries like the ones asked in this thread's
> headline but which is much easier to do through the use of supercolumns.
>
>
> On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> The use case in question was: Only accessing some columns.
>>
>> Even if that is not the case:
>>
>> SuperColumns: 1 extra level of nesting
>> Composite Colunns: Arbitrary levels of nesting
>>
>> SuperColumns: More overhead (space on disk) then using your own delimiter
>> '_'
>> SuperColumns: Likely going to be replaced in future c* version behind
>> the scenes by composite columns anyway
>> SuperColumns: Usually an afterthought for API developers, (support for
>> them comes "later")
>> SuperColumns: Almost always utilized incorrectly by users, users speak
>> of '10%' performance gains after they switch away from them.
>>
>> There are some (a small % of cases) where SuperColumns are a better
>> choice, but this is rare. With composites and concatenating columns
>> they have no great purpose any more, (bad analogy coming!) like a
>> mechanical type writer.
>>
>> On 12/29/11, Philippe <wa...@gmail.com> wrote:
>> > Would you stand by that statement in case all colums inside the super
>> > column need to be read?  Why?
>> >
>> > Thanks
>> > Le 28 déc. 2011 19:26, "Edward Capriolo" <ed...@gmail.com> a
>> écrit :
>> >
>> >> Super columns have the same fundamental problem and perform worse in
>> >> general. So switching from composites to super columns is NEVER a good
>> >> idea.
>> >>
>> >>
>> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya <ad...@gmail.com> wrote:
>> >>
>> >>> Since I have around 20 items to query, I guess making 20 queries to
>> >>> retrieve activities by all followies on all of those 20 columns would
>> too
>> >>> inefficient, so to take the advantage of more efficient queries, are
>> >>> supercolumns recommended for this case ? Anyways, in case I use
>> >>> supercolumns, I need to retrieve the entire supercolumn at any point
>> of
>> >>> time & I am writing subcolumn(s) to the supercolumn at different times
>> >>> not
>> >>> at once.
>> >>>
>> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
>> >>> <ed...@gmail.com>wrote:
>> >>>
>> >>>> You need to execute one get slice operation for each item id or if
>> the
>> >>>> row is not large , you can try one large get slice on the entire row
>> and
>> >>>> deal with the results client side.
>> >>>>
>> >>>> If you try method 1 When doing slices on composites you can set the
>> >>>> start inclusive or exclusive values to get only the column you want
>> and
>> >>>> not
>> >>>> some extra columns up to slice range size.
>> >>>>
>> >>>>
>> >>>> On Tuesday, December 27, 2011, Aditya <ad...@gmail.com> wrote:
>> >>>> > I need to store data of all activities by user's followies in
>> single
>> >>>> row. I am trying to do that making use of composite column names in a
>> >>>> single user specific row named 'rowX'.
>> >>>> > On any activity by a user's followie on an item, a column is
>> stored in
>> >>>> 'rowX'. The column has a composite type column name made up of
>> >>>> itemId+userId (which makes it unique col. name) in rowX. (& column
>> value
>> >>>> contains the activity data related to that item by that followie)
>> >>>> >
>> >>>> > Now I want to retrieve activity by all users on a list of items.
>> So I
>> >>>> need to retrieve all composite columns with composite's first
>> component
>> >>>> matching the itemId. Is it possible to do such a query to Cassandra
>> ? I
>> >>>> am
>> >>>> using Hector.
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

Posted by Philippe <wa...@gmail.com>.

I currently have
scf[c1][sc1]=value
scf[c1][sc2]=value
...
scf[c2][sc1]=value
scf[c2][sc2]=value
scf[c2][sc3]=value
scf[c2][sc4]=value

99% of the time, I do multiget super slices: for multiple keys, I query for
columns explicitly c1,c2,c10,c12
1% of the time, I do a multigetrange superslice where for multiple keys, I
query for a range of super columns
As Tyler said, it can be done by specifying supercolumns in the slice
predicate, it will implicitly return all its columns. I use Hector and it
works great.

Now interestingly enough, column names sc1, sc2, sc3 are in fact home-made
composite columns.

I could and would switch to full composite columns because I am fishing for
every drop of performance I can. However, I would need "Letting
multiget_slice accept multiple SlicePredicates per key could also
accomplish this."
Can anyone on the dev team comment on doing this ? Is it a no-no ?

Thanks

2011/12/29 Edward Capriolo <ed...@gmail.com>

> Hum...
>
> Do you have this?
> scf [b][1][a]=value
> scf [b][1][x]=value
> scf [b][7][b]=value
>
> and you want to slice:
> scf [b][1][*]
>
> Which would result in
>
> scf [b][1][a]=value
> scf [b][1][x]=value
>
> ?
>
> The composite version of this would be:
> cf [b][1:a]=value
> cf [b][1:x]=value
> cf [b][7:b]=value
>
> I am not sure exactly what you are doing because A SlicePredicate
> takes either a list of columns or a SliceRange. A ColumnPath takes a
> Single SuperColumn.
>
> I do not see how this is done with Columns or SuperColumns. Maybe you
> can provide a code snippet and/or some sample data?
>
> On 12/29/11, Aditya <ad...@gmail.com> wrote:
> > @Edward: Perhaps you missed to notice that I need to always retrieve 'all
> > columns' under the supercolumn at any time.. and as per my query
> > requirements if I use composite columns instead of supercolumns then it
> is
> > impossible to do wildcard queries like the ones asked in this thread's
> > headline but which is much easier to do through the use of supercolumns.
> >
> > On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo
> > <ed...@gmail.com>wrote:
> >
> >> The use case in question was: Only accessing some columns.
> >>
> >> Even if that is not the case:
> >>
> >> SuperColumns: 1 extra level of nesting
> >> Composite Colunns: Arbitrary levels of nesting
> >>
> >> SuperColumns: More overhead (space on disk) then using your own
> delimiter
> >> '_'
> >> SuperColumns: Likely going to be replaced in future c* version behind
> >> the scenes by composite columns anyway
> >> SuperColumns: Usually an afterthought for API developers, (support for
> >> them comes "later")
> >> SuperColumns: Almost always utilized incorrectly by users, users speak
> >> of '10%' performance gains after they switch away from them.
> >>
> >> There are some (a small % of cases) where SuperColumns are a better
> >> choice, but this is rare. With composites and concatenating columns
> >> they have no great purpose any more, (bad analogy coming!) like a
> >> mechanical type writer.
> >>
> >> On 12/29/11, Philippe <wa...@gmail.com> wrote:
> >> > Would you stand by that statement in case all colums inside the super
> >> > column need to be read?  Why?
> >> >
> >> > Thanks
> >> > Le 28 déc. 2011 19:26, "Edward Capriolo" <ed...@gmail.com> a
> >> écrit :
> >> >
> >> >> Super columns have the same fundamental problem and perform worse in
> >> >> general. So switching from composites to super columns is NEVER a
> good
> >> >> idea.
> >> >>
> >> >>
> >> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya <ad...@gmail.com> wrote:
> >> >>
> >> >>> Since I have around 20 items to query, I guess making 20 queries to
> >> >>> retrieve activities by all followies on all of those 20 columns
> would
> >> too
> >> >>> inefficient, so to take the advantage of more efficient queries, are
> >> >>> supercolumns recommended for this case ? Anyways, in case I use
> >> >>> supercolumns, I need to retrieve the entire supercolumn at any point
> >> >>> of
> >> >>> time & I am writing subcolumn(s) to the supercolumn at different
> times
> >> >>> not
> >> >>> at once.
> >> >>>
> >> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
> >> >>> <ed...@gmail.com>wrote:
> >> >>>
> >> >>>> You need to execute one get slice operation for each item id or if
> >> >>>> the
> >> >>>> row is not large , you can try one large get slice on the entire
> row
> >> and
> >> >>>> deal with the results client side.
> >> >>>>
> >> >>>> If you try method 1 When doing slices on composites you can set the
> >> >>>> start inclusive or exclusive values to get only the column you want
> >> and
> >> >>>> not
> >> >>>> some extra columns up to slice range size.
> >> >>>>
> >> >>>>
> >> >>>> On Tuesday, December 27, 2011, Aditya <ad...@gmail.com> wrote:
> >> >>>> > I need to store data of all activities by user's followies in
> >> >>>> > single
> >> >>>> row. I am trying to do that making use of composite column names
> in a
> >> >>>> single user specific row named 'rowX'.
> >> >>>> > On any activity by a user's followie on an item, a column is
> stored
> >> in
> >> >>>> 'rowX'. The column has a composite type column name made up of
> >> >>>> itemId+userId (which makes it unique col. name) in rowX. (& column
> >> value
> >> >>>> contains the activity data related to that item by that followie)
> >> >>>> >
> >> >>>> > Now I want to retrieve activity by all users on a list of items.
> So
> >> I
> >> >>>> need to retrieve all composite columns with composite's first
> >> component
> >> >>>> matching the itemId. Is it possible to do such a query to
> Cassandra ?
> >> I
> >> >>>> am
> >> >>>> using Hector.
> >> >>>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >
>