You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Aklin_81 <as...@gmail.com> on 2011/01/14 04:27:30 UTC

Is there any way I could use keys of other rows as column names that could be sorted according to time ?

I would like to keep the reference of other rows as names of super
column and sort those super columns according to time.
Is there any way I could implement that ?

Thanks in advance!

Re: Problem starting Cassandra on Ubuntu

Posted by Peter Schuller <pe...@infidyne.com>.

> ERROR [main] 2011-01-14 15:37:49,965 DatabaseDescriptor.java (line 388) Fatal error: null; mapping values are not allowed here

This indicates there is a problem with the configuration file; that
error is coming from the YAML parser. Double-check what changes you
have made relative to the version that was installed by the package;
there is probably a mistake in there somewhere. (Yes, the error
message could of course be more informative...)

--
/ Peter Schuller

Problem starting Cassandra on Ubuntu

Posted by kh jo <jo...@yahoo.com>.

Hi,

just installed Cassandra on Ubuntu using package manager

but I can not start it

I get the following error in the logs:

 INFO [main] 2011-01-14 15:37:49,758 AbstractCassandraDaemon.java (line 74) Heap size: 1051525120/1051525120
 WARN [main] 2011-01-14 15:37:49,826 CLibrary.java (line 73) Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later
 WARN [main] 2011-01-14 15:37:49,827 CLibrary.java (line 73) Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later
 WARN [main] 2011-01-14 15:37:49,827 CLibrary.java (line 105) Unknown mlockall error 0
 INFO [main] 2011-01-14 15:37:49,841 DatabaseDescriptor.java (line 121) Loading settings from file:/etc/cassandra/cassandra.yaml
ERROR [main] 2011-01-14 15:37:49,965 DatabaseDescriptor.java (line 388) Fatal error: null; mapping values are not allowed here

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Aklin_81 <as...@gmail.com>.

No,  you do not need to shut up, please! :)
you may be clearing up my further misconceptions on the topic!

Anyways, the link b/w 1st and 2nd para was that since the rows
distribution among nodes is not affected by key(as you rightly said)
but by md5 hash of the key thus I can use just any key including the
timeUUIDType key (that would be helpful in my case) with Random
partition.



On 1/14/11, Roshan Dawrani <ro...@gmail.com> wrote:
> On Fri, Jan 14, 2011 at 8:51 PM, Aklin_81 <as...@gmail.com> wrote:
>
>> I just read that cassandra internally creates a md5 hash that is used
>> for distributing the load by sending it to a node reponsible for the
>> range within which that md5 hash falls, so even when we create
>> sequential keys, their MD5 hash is not the same & hence they are not
>> sent to same node. This was my misunderstanding of this concept.
>> Sorry for creating confusions !
>>
>> So.. with this I think I will be able to use timeUUID as row key !?
>>
>>
> Now, what really is the link between your corrected understanding and the
> conclusion in the 2nd para? :-)
>
> I miss the link you are using to come from para 1 to para 2.
>
> Just because you use time UUID as the row key, there is no storage guarantee
> because of that. Distribution of rows and ordering across nodes is only
> based on what partitioner you are using - it is not (only) related to the
> the type of the key.
>
> May be I should just shut up now as I don't seem to be understanding you
> requirement :-)
>
>
>
>
>
>
>
>                                                       <#>
> <#>
> <#>       <#>
>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Roshan Dawrani <ro...@gmail.com>.

On Fri, Jan 14, 2011 at 8:51 PM, Aklin_81 <as...@gmail.com> wrote:

> I just read that cassandra internally creates a md5 hash that is used
> for distributing the load by sending it to a node reponsible for the
> range within which that md5 hash falls, so even when we create
> sequential keys, their MD5 hash is not the same & hence they are not
> sent to same node. This was my misunderstanding of this concept.
> Sorry for creating confusions !
>
> So.. with this I think I will be able to use timeUUID as row key !?
>
>
Now, what really is the link between your corrected understanding and the
conclusion in the 2nd para? :-)

I miss the link you are using to come from para 1 to para 2.

Just because you use time UUID as the row key, there is no storage guarantee
because of that. Distribution of rows and ordering across nodes is only
based on what partitioner you are using - it is not (only) related to the
the type of the key.

May be I should just shut up now as I don't seem to be understanding you
requirement :-)

                                                      <#>
<#>
<#>       <#>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Aklin_81 <as...@gmail.com>.

I just read that cassandra internally creates a md5 hash that is used
for distributing the load by sending it to a node reponsible for the
range within which that md5 hash falls, so even when we create
sequential keys, their MD5 hash is not the same & hence they are not
sent to same node. This was my misunderstanding of this concept.
Sorry for creating confusions !

So.. with this I think I will be able to use timeUUID as row key !?

Aaron, if you could kindly share your views on my response to your
queries above.




On 1/14/11, Roshan Dawrani <ro...@gmail.com> wrote:
> I am not clear what you guys are trying to do and say :-)
>
> So, let's take some specifics...
>
> Say you want to create rows in some column family (say CF_A), and as you
> create them, you want to store their row key in column names in some other
> column family (say CF_B) - possibly for filtering keys based on time later,
> etc, etc...
>
> Now your rows in CF_A may be keyed on a TimeUUID and if you store these keys
> as column names in CF_B that has comparator as TimeUUID, then you get your
> column names time sorted automatically.
>
> Now CF_A may be split across nodes - is that of any concern to you?
>
> Are you expecting any storage relationship between column names of CF_B and
> rows of CF_A?
>
> rgds,
> Roshan
>
> On Fri, Jan 14, 2011 at 7:58 PM, Aklin_81 <as...@gmail.com> wrote:
>
>> I too believed so!  but not totally sure.
>>
>> On 1/14/11, Rajkumar Gupta <ra...@gmail.com> wrote:
>> > I am not sure but I guess because all the rows of certain time range
>> > will
>> go
>> > to just one node & will not be evenly distributed because the timeUUID
>> will
>> > not be random but sequential according to time... I am not sure
>> anyways...
>> >
>>
>
>                                                    <#>
> <#>
> <#>       <#>
>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Roshan Dawrani <ro...@gmail.com>.

I am not clear what you guys are trying to do and say :-)

So, let's take some specifics...

Say you want to create rows in some column family (say CF_A), and as you
create them, you want to store their row key in column names in some other
column family (say CF_B) - possibly for filtering keys based on time later,
etc, etc...

Now your rows in CF_A may be keyed on a TimeUUID and if you store these keys
as column names in CF_B that has comparator as TimeUUID, then you get your
column names time sorted automatically.

Now CF_A may be split across nodes - is that of any concern to you?

Are you expecting any storage relationship between column names of CF_B and
rows of CF_A?

rgds,
Roshan

On Fri, Jan 14, 2011 at 7:58 PM, Aklin_81 <as...@gmail.com> wrote:

> I too believed so!  but not totally sure.
>
> On 1/14/11, Rajkumar Gupta <ra...@gmail.com> wrote:
> > I am not sure but I guess because all the rows of certain time range will
> go
> > to just one node & will not be evenly distributed because the timeUUID
> will
> > not be random but sequential according to time... I am not sure
> anyways...
> >
>

                                                   <#>
<#>
<#>       <#>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Aklin_81 <as...@gmail.com>.

I too believed so!  but not totally sure.

On 1/14/11, Rajkumar Gupta <ra...@gmail.com> wrote:
> I am not sure but I guess because all the rows of certain time range will go
> to just one node & will not be evenly distributed because the timeUUID will
> not be random but sequential according to time... I am not sure anyways...
>
> On Fri, Jan 14, 2011 at 7:18 PM, Roshan Dawrani
> <ro...@gmail.com>wrote:
>
>> On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 <as...@gmail.com> wrote:
>>
>>> @Roshan
>>> Yes, I thought about that, but then I wouldn't be able to use the
>>> Random Partitioner.
>>>
>>>
>> Can you please expand a bit on this? What is this restriction? Can you
>> point me to some relevant documentation on this?
>>
>> Thanks.
>>
>> <#12d84d3a0b3ce961_12d84c9312ae2134_>
>> <#12d84d3a0b3ce961_12d84c9312ae2134_>
>> <#12d84d3a0b3ce961_12d84c9312ae2134_>
>> <#12d84d3a0b3ce961_12d84c9312ae2134_>
>>
>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Rajkumar Gupta <ra...@gmail.com>.

I am not sure but I guess because all the rows of certain time range will go
to just one node & will not be evenly distributed because the timeUUID will
not be random but sequential according to time... I am not sure anyways...

On Fri, Jan 14, 2011 at 7:18 PM, Roshan Dawrani <ro...@gmail.com>wrote:

> On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 <as...@gmail.com> wrote:
>
>> @Roshan
>> Yes, I thought about that, but then I wouldn't be able to use the
>> Random Partitioner.
>>
>>
> Can you please expand a bit on this? What is this restriction? Can you
> point me to some relevant documentation on this?
>
> Thanks.
>                                                    <#12d84d3a0b3ce961_12d84c9312ae2134_>
> <#12d84d3a0b3ce961_12d84c9312ae2134_>
> <#12d84d3a0b3ce961_12d84c9312ae2134_>       <#12d84d3a0b3ce961_12d84c9312ae2134_>
>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Roshan Dawrani <ro...@gmail.com>.

On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 <as...@gmail.com> wrote:

> @Roshan
> Yes, I thought about that, but then I wouldn't be able to use the
> Random Partitioner.
>
>
Can you please expand a bit on this? What is this restriction? Can you point
me to some relevant documentation on this?

Thanks.
                                                   <#>
<#>
<#>       <#>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Aklin_81 <as...@gmail.com>.

@Roshan
Yes, I thought about that, but then I wouldn't be able to use the
Random Partitioner.

@Aaron

Do you mean like this: 'timeUUID+ row_key'  as the supercolumn names?
then when retriving the row_key from this column name, will I be
required to parse the name ? How do I do that exactly ?

>Some issues:
>- Will you have time collisions ?
No I wont be mostly having time collisions. If they happen in 1% case,
I dont mind.

>- Not sure what your are storing in the super columns, but their are limitations.
I would be storing maximum 5 subcolumns inside and would be retrieving
them altogether.

>- If you are using cassandra 0.7, have you looked at the secondary indexes ?

Yes I did but I think they are not helpful in my case.

This is what I am trying to do :
******
This is from an older post that I made earlier on the mailing list:-
I am working on a project of Questions/answers forum that allows a
user to follow questions on certain topics from his followies.
I want to build user's news-feed that comprises of only those
questions that have been posted by his followies & tagged on the
topics that he is following.
Simple news-feed design that shows all the posts from network would be
easy to design using Cassandra by executing fast writes to all
followers of a user about the post from user. But for my application,
there is an additional filter of 'followed topics', (ie, the user
receives posts "created by his followies" && "on topics user is
following")

I was thinking of implementing this way:
Initially writing to all followers, the postID of posts from their
network, by adding a supercolumn to the rows of all followers in the
News-feed supercolumnfamily, with supercolumn name as timestamp(for
sort by time) and 5 sub-columns containing the topic tags of that
post.
At the read time, compare subcolumn values with the topics user is
following, if they match then show the post. (I would be required to
fetch the list of followed topics of the user at read time, hence
should I store the topic list as a supercolumn in this Newsfeed
supercolumnfamily only?)

An important point to note that, often, the posts will have zero
subcolumn which would mean that this post has to be shown without
validating with the user's list of followed topics.

There is another view for the users which allows them to see all the
posts from their followies(without topic filters). In this case no
checking of subcolumns for topics will be performed.

I got good insights from Tyler on this, but he was recommending me an
approach which although would be beneficial for reads performance, but
by way of too much denormalizing like 70-80x. I currently fear that
approach and would like to test upon this.
******
any comments, feedback greatly appreciated..

thanks so much!

On 1/14/11, Roshan Dawrani <ro...@gmail.com> wrote:
> It's possible that I am misunderstanding the question in some way.
>
> The row keys can be Time UUIDs and with those row keys as column names, u
> can use comparator TIMEUUIDTYPE to have them sorted by time automatically.
>
> On Fri, Jan 14, 2011 at 9:18 AM, Aaron Morton
> <aa...@thelastpickle.com>wrote:
>
>> You could make the time an a fixed width integer and prefix your row keys
>> with it, then set the comparotor to ascii or utf.
>>
>> Some issues:
>> - Will you have time collisions ?
>> - Not sure what your are storing in the super columns, but their are
>> limitations http://wiki.apache.org/cassandra/CassandraLimitations
>> <http://wiki.apache.org/cassandra/CassandraLimitations>- If you are using
>> cassandra 0.7, have you looked at the secondary indexes ?
>> http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes
>>
>> <http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes>If
>> you provide some more info on the problem your trying to solve we may be
>> able to help some more.
>>
>> Cheers
>> Aaron
>>
>>
>> On 14 Jan, 2011,at 04:27 PM, Aklin_81 <as...@gmail.com> wrote:
>>
>> I would like to keep the reference of other rows as names of super
>> column and sort those super columns according to time.
>> Is there any way I could implement that ?
>>
>> Thanks in advance!
>>
>>
>
>
> --
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani <http://twitter.com/roshandawrani>
> Skype: roshandawrani
>
>                                                    <#>
> <#>
> <#>       <#>
>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Roshan Dawrani <ro...@gmail.com>.

It's possible that I am misunderstanding the question in some way.

The row keys can be Time UUIDs and with those row keys as column names, u
can use comparator TIMEUUIDTYPE to have them sorted by time automatically.

On Fri, Jan 14, 2011 at 9:18 AM, Aaron Morton <aa...@thelastpickle.com>wrote:

> You could make the time an a fixed width integer and prefix your row keys
> with it, then set the comparotor to ascii or utf.
>
> Some issues:
> - Will you have time collisions ?
> - Not sure what your are storing in the super columns, but their are
> limitations http://wiki.apache.org/cassandra/CassandraLimitations
> <http://wiki.apache.org/cassandra/CassandraLimitations>- If you are using
> cassandra 0.7, have you looked at the secondary indexes ?
> http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes
>
> <http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes>If
> you provide some more info on the problem your trying to solve we may be
> able to help some more.
>
> Cheers
> Aaron
>
>
> On 14 Jan, 2011,at 04:27 PM, Aklin_81 <as...@gmail.com> wrote:
>
> I would like to keep the reference of other rows as names of super
> column and sort those super columns according to time.
> Is there any way I could implement that ?
>
> Thanks in advance!
>
>


-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani <http://twitter.com/roshandawrani>
Skype: roshandawrani

                                                   <#>
<#>
<#>       <#>

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

Posted by Aaron Morton <aa...@thelastpickle.com>.

You could make the time an a fixed width integer and prefix your row keys with it, then set the comparotor to ascii or utf. 

Some issues:
- Will you have time collisions ? 
- Not sure what your are storing in the super columns, but their are limitations http://wiki.apache.org/cassandra/CassandraLimitations
- If you are using cassandra 0.7, have you looked at the secondary indexes ? http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes

If you provide some more info on the problem your trying to solve we may be able to help some more. 

Cheers
Aaron


On 14 Jan, 2011,at 04:27 PM, Aklin_81 <as...@gmail.com> wrote:

I would like to keep the reference of other rows as names of super
column and sort those super columns according to time.
Is there any way I could implement that ?

Thanks in advance!