You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Onur AKTAS <on...@live.com> on 2010/04/09 23:56:10 UTC

How to perform queries on Cassandra?

Hi,
I want to use Cassandra for a new project, as you can guess I have a RDBMS background however do not have any experience with NoSQL databases except key/value pair in memory data grids/caches. (Oracle Coherence /Memcached.).
I'm trying to find out how do you perform queries with calculations on the fly without inserting the data as calculated from the beginning.
Lets say we have latitude and longitude coordinates of all users and we have  Distance(from_lat, from_long, to_lat, to_long) function whichgives distance between lat/longs pairs in kilometers.
Ex:user1_lat = 40 user1_long = 20user2_lat = 30 user3_long = 50
So, if we want to do same operation in regular RDBMS we can use this kind of query to get users near to user_1's location.
* select user from users where Distance(40, 20, user.lat, user.long) = 5
How do we do this kind of operations in cassandra? 
If we insert data as calculated from the beginning, lets say we have 1 million users, then do we need to do 1 million insert operations for just updating 1 users coordinates? (Ofcourse no but then how?).
I believe huge complexity calculations are possible with Cassandra, but do not know about querying out of accessing the data by it's key.
Thanks,
 		 	   		  
_________________________________________________________________
Yeni Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin.
http://windows.microsoft.com/shop

Re: How to perform queries on Cassandra?

Posted by Peter Chang <pe...@gmail.com>.
I'm going to eventually want to do something similar so I took particular
note of a thread that came up earlier in this forum. I haven't fully
investigated it but I think it's a great start for what you need to do.

http://www.mail-archive.com/user@cassandra.apache.org/msg00201.html

<http://www.mail-archive.com/user@cassandra.apache.org/msg00201.html>In
particular, it talks about representing coordinates as hashes. I believe
that points that are close to each other have similar hash stems (eg.
"abcdefg" is with x distance of "abcdeff").

I haven't looked it in detail.

Peter

2010/4/9 Onur AKTAS <on...@live.com>

>  Hi,
>
> I want to use Cassandra for a new project, as you can guess I have a RDBMS
> background however do not have any experience with NoSQL databases except
> key/value pair in memory data grids/caches. (Oracle Coherence /Memcached.).
>
> I'm trying to find out how do you perform queries with calculations on the
> fly without inserting the data as calculated from the beginning.
>
> Lets say we have latitude and longitude coordinates of all users and we
> have  Distance(from_lat, from_long, to_lat, to_long) function which
> gives distance between lat/longs pairs in kilometers.
>
> Ex:
> user1_lat = 40 user1_long = 20
> user2_lat = 30 user3_long = 50
>
> So, if we want to do same operation in regular RDBMS we can use this kind
> of query to get users near to user_1's location.
>
> * select user from users where Distance(40, 20, u ser.lat, user.long) = 5
>
> How do we do this kind of operations in cassandra?
>
> If we insert data as calculated from the beginning, lets say we have 1
> million users, then do we need to do 1 million insert operations for just
> updating 1 users coordinates? (Ofcourse no but then how?).
>
> I believe huge complexity calculations are possible with Cassandra, but do
> not know about querying out of accessing the data by it's key.
>
> Thanks,
>
>
> ------------------------------
> Windows 7: Size en uygun bilgisayarı bulun. Daha fazla bilgi edinin.<http://windows.microsoft.com/shop>
>

Re: How to perform queries on Cassandra?

Posted by Malcolm Smith <ma...@treehousesystems.com>.
Mike are you stuck on a train too? :-)



On Apr 9, 2010, at 8:51 PM, Mike Gallamore <mike.e.gallamore@googlemail.com 
 > wrote:

> I apologize in advance if this goes into esoteric algorithms a bit  
> too much but I think this will get to an interesting idea to solve  
> your problem. My background is physics particularly computer  
> simulations of complex systems. Anyways in cosmology an interesting  
> algorithm is called an n-body tree code (its been around for at  
> least 20 years so a lot is available online about it). Since every  
> object with mass (well in general relativity actually anything with  
> energy but I digress) interacts with every other object with mass,  
> you end up with the "n-body" problem. The number of interactions in  
> a system goes as n(n-1) ~= n^2 where n is the number of elements.  
> This lead to a nightmare to do simulations of large systems, say two  
> galaxies colliding. 1 billion X 1 billion minus one is huge and  
> effectively incalculable since you would have to calculate this each  
> time you wanted to increment the simulation a tiny bit ahead in  
> time. How do you get a reasonable approximation to the solution? The  
> answer or at least one of them is n-body "tree codes".
>
> You take advantage of the fact that the the force that one star  
> feels from another falls off as 1/r^2 and importantly two stars far  
> away from the first star but close together relatively have roughly  
> the same magnitude and direction of the "r" vector. So you can  
> simply clump them together, ie sum there masses, and the force is GM1 
> (M"sum")/r^2. To do this efficiently numerically you break down the  
> system using binary search trees. Thinking in 2D just to keep it  
> simple, you divide the space into top left, top right, bottom left  
> bottom right as a first approximation. Then continually do that  
> until you end up with each element in its own box. When you figure  
> out the forces you are going to apply to the system you just take  
> the distance to the middle of the box that contains the ones you are  
> going to consider together (the closer to the star in question the  
> smaller the boxes need to be because the direction of r changes  
> quicker the closer the boxes are to the star, but farther away you  
> can use larger and larger boxes (which would contain a 2D tree like  
> structure descending to the point where each of the stars contained  
> are trapped in there own little box), sum the number of stars in the  
> box and presto.
>
> How would this help you? Well if you encoded the "box hierachy", say  
> 1 for top left, 2 for top right, 3 for bottom left, 4 for bottom  
> right, then you could specify the box that someone is in based on a  
> string like "14234". To find the set of stars/points/whatever that  
> are at least x away you just would have to do a range search for all  
> the points with their location "string" larger than or equal to the  
> location sting corresponding to the closest corner of the biggest  
> box such that its corner is at least "x" units away. Quite good as a  
> first approximation and the search algorithm should run as O(nlog 
> (n)) which is a logirithmic decrease in computation time. Ie the 1  
> billion times 1 billion -1 problem becomes 1 billion times ~9, much  
> much nicer. Really difficult thing to explain without looking over a  
> diagram in person I admit but hopefully it makes sense if you look  
> up the algorithm online.
>
>
> On 04/09/2010 05:01 PM, malsmith wrote:
>>
>>
>>
>> It's sort of an interesting problem - in RDBMS one relatively  
>> simple approach would be calculate a rectangle that is X km by Y km  
>> with User 1's location at the center.  So the rectangle is UserX -  
>> 10KmX , UserY-10KmY to UserX+10KmX , UserY+10KmY
>>
>> Then you could query the database for all other users where that  
>> each user considered is curUserX > UserX-10Km and curUserX < UserX 
>> +10KmX and curUserY > UserY-10KmY and curUserY < UserY+10KmY
>> * Not the 10KmX and 10KmY are really a translation from Kilometers  
>> to degrees of  lat and longitude  (that you can find on a google  
>> search)
>>
>> With the right indexes this query actually runs pretty well.
>>
>> Translating that to Cassandra seems a bit complex at first - but  
>> you could try something like pre-calculating a grid with the right  
>> resolution (like a square of 5KM per side) and assign every user to  
>> a particular grid ID.  That way you just calculate with grid ID  
>> User1 is in then do a direct key lookup to get a list of the users  
>> in that same grid id.
>>
>> A second approach would be to have to column families -- one that  
>> maps a Latitude to a list of users who are at that latitude and a  
>> second that maps users who are at a particular longitude.  You  
>> could do the same rectange calculation above then do a get_slice  
>> range lookup to get a list of users from range of latitude and a  
>> second list from the range of longitudes.    You would then need to  
>> do a in-memory nested loop to find the list of users that are in  
>> both lists.  This second approach could cause some trouble  
>> depending on where you search and how many users you really have --  
>> some latitudes and longitudes have many many people in them
>>
>> So, it seems some version of a chunking / grid id thing would be  
>> the better approach.   If you let people zoom in or zoom out - you  
>> could just have different column families for each level of zoom.
>>
>>
>> I'm stuck on a stopped train so -- here is even more code:
>>
>> static Decimal GetLatitudeMiles(Decimal lat)
>> {
>> Decimal f = 0.0M;
>> lat = Math.Abs(lat);
>> f = 68.99M;
>>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> else if (lat >= 80.0M) { f = 69.38M; }
>>
>> return f;
>> }
>>
>>
>> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
>> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos 
>> ((Double) zList[0].Latitude))) * 24900.0M / 360.0M;
>>                         dRadius = 10.0M  // ten miles
>> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>>
>> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>>
>>
>>
>> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>>>
>>> 2010/4/9 Onur AKTAS <on...@live.com>:
>>> > ...
>>> > I'm trying to find out how do you perform queries with  
>>> calculations on the
>>> > fly without inserting the data as calculated from the beginning.
>>> > Lets say we have latitude and longitude coordinates of all users  
>>> and we have
>>> >  Distance(from_lat, from_long, to_lat, to_long) function which
>>> > gives distance between lat/longs pairs in kilometers.
>>>
>>> I'm not an expert, but I think that it boils down to "MapReduce"  
>>> and "Hadoop".
>>>
>>> I don't think that there's any top-down tutorial on those two words,
>>> you'll have to research yourself starting here:
>>>
>>>  * http://en.wikipedia.org/wiki/MapReduce
>>>
>>>  * http://hadoop.apache.org/
>>>
>>>  * http://wiki.apache.org/cassandra/HadoopSupport
>>>
>>> I don't think it is all documented in any one place yet...
>>>
>>>  Paul Prescod
>>>
>>
>

Re: How to perform queries on Cassandra?

Posted by Mike Gallamore <mi...@googlemail.com>.
I apologize in advance if this goes into esoteric algorithms a bit too 
much but I think this will get to an interesting idea to solve your 
problem. My background is physics particularly computer simulations of 
complex systems. Anyways in cosmology an interesting algorithm is called 
an n-body tree code (its been around for at least 20 years so a lot is 
available online about it). Since every object with mass (well in 
general relativity actually anything with energy but I digress) 
interacts with every other object with mass, you end up with the 
"n-body" problem. The number of interactions in a system goes as n(n-1) 
~= n^2 where n is the number of elements. This lead to a nightmare to do 
simulations of large systems, say two galaxies colliding. 1 billion X 1 
billion minus one is huge and effectively incalculable since you would 
have to calculate this each time you wanted to increment the simulation 
a tiny bit ahead in time. How do you get a reasonable approximation to 
the solution? The answer or at least one of them is n-body "tree codes".

You take advantage of the fact that the the force that one star feels 
from another falls off as 1/r^2 and importantly two stars far away from 
the first star but close together relatively have roughly the same 
magnitude and direction of the "r" vector. So you can simply clump them 
together, ie sum there masses, and the force is GM1(M"sum")/r^2. To do 
this efficiently numerically you break down the system using binary 
search trees. Thinking in 2D just to keep it simple, you divide the 
space into top left, top right, bottom left bottom right as a first 
approximation. Then continually do that until you end up with each 
element in its own box. When you figure out the forces you are going to 
apply to the system you just take the distance to the middle of the box 
that contains the ones you are going to consider together (the closer to 
the star in question the smaller the boxes need to be because the 
direction of r changes quicker the closer the boxes are to the star, but 
farther away you can use larger and larger boxes (which would contain a 
2D tree like structure descending to the point where each of the stars 
contained are trapped in there own little box), sum the number of stars 
in the box and presto.

How would this help you? Well if you encoded the "box hierachy", say 1 
for top left, 2 for top right, 3 for bottom left, 4 for bottom right, 
then you could specify the box that someone is in based on a string like 
"14234". To find the set of stars/points/whatever that are at least x 
away you just would have to do a range search for all the points with 
their location "string" larger than or equal to the location sting 
corresponding to the closest corner of the biggest box such that its 
corner is at least "x" units away. Quite good as a first approximation 
and the search algorithm should run as O(nlog(n)) which is a logirithmic 
decrease in computation time. Ie the 1 billion times 1 billion -1 
problem becomes 1 billion times ~9, much much nicer. Really difficult 
thing to explain without looking over a diagram in person I admit but 
hopefully it makes sense if you look up the algorithm online.


On 04/09/2010 05:01 PM, malsmith wrote:
>
>
> It's sort of an interesting problem - in RDBMS one relatively simple 
> approach would be calculate a rectangle that is X km by Y km with User 
> 1's location at the center.  So the rectangle is UserX - 10KmX , 
> UserY-10KmY to UserX+10KmX , UserY+10KmY
>
> Then you could query the database for all other users where that each 
> user considered is curUserX > UserX-10Km and curUserX < UserX+10KmX 
> and curUserY > UserY-10KmY and curUserY < UserY+10KmY
> * Not the 10KmX and 10KmY are really a translation from Kilometers to 
> degrees of  lat and longitude  (that you can find on a google search)
>
> With the right indexes this query actually runs pretty well.
>
> Translating that to Cassandra seems a bit complex at first - but you 
> could try something like pre-calculating a grid with the right 
> resolution (like a square of 5KM per side) and assign every user to a 
> particular grid ID.  That way you just calculate with grid ID User1 is 
> in then do a direct key lookup to get a list of the users in that same 
> grid id.
>
> A second approach would be to have to column families -- one that maps 
> a Latitude to a list of users who are at that latitude and a second 
> that maps users who are at a particular longitude.  You could do the 
> same rectange calculation above then do a get_slice range lookup to 
> get a list of users from range of latitude and a second list from the 
> range of longitudes.    You would then need to do a in-memory nested 
> loop to find the list of users that are in both lists.  This second 
> approach could cause some trouble depending on where you search and 
> how many users you really have -- some latitudes and longitudes have 
> many many people in them
>
> So, it seems some version of a chunking / grid id thing would be the 
> better approach.   If you let people zoom in or zoom out - you could 
> just have different column families for each level of zoom.
>
>
> I'm stuck on a stopped train so -- here is even more code:
>
> static Decimal GetLatitudeMiles(Decimal lat)
> {
> Decimal f = 0.0M;
> lat = Math.Abs(lat);
> f = 68.99M;
>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> else if (lat >= 80.0M) { f = 69.38M; }
>
> return f;
> }
>
>
> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
> Decimal MilesPerDegreeLongitude = ((Decimal) 
> Math.Abs(Math.Cos((Double) zList[0].Latitude))) * 24900.0M / 360.0M;
>                         dRadius = 10.0M  // ten miles
> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>
> ps.TopLatitude = zList[0].Latitude - deltaLat;
> ps.TopLongitude = zList[0].Longitude - deltaLong;
> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>
>
>
> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> 2010/4/9 Onur AKTAS<onur.aktas@live.com  <ma...@live.com>>:
>> >  ...
>> >  I'm trying to find out how do you perform queries with calculations on the
>> >  fly without inserting the data as calculated from the beginning.
>> >  Lets say we have latitude and longitude coordinates of all users and we have
>> >    Distance(from_lat, from_long, to_lat, to_long) function which
>> >  gives distance between lat/longs pairs in kilometers.
>>
>> I'm not an expert, but I think that it boils down to "MapReduce" and "Hadoop".
>>
>> I don't think that there's any top-down tutorial on those two words,
>> you'll have to research yourself starting here:
>>
>>   *http://en.wikipedia.org/wiki/MapReduce
>>
>>   *http://hadoop.apache.org/
>>
>>   *http://wiki.apache.org/cassandra/HadoopSupport
>>
>> I don't think it is all documented in any one place yet...
>>
>>   Paul Prescod
>>      
>


Re: How to perform queries on Cassandra?

Posted by Paul Prescod <pr...@gmail.com>.
This tutorial may help.

http://www.sodeso.nl/?p=251

Cassandra is very early software...not even version 1.0 yet. You'll
need to figure out a lot yourself by reading blog posts, examples,
comparing to API documentation, etc. Cassandra is an entirely
different model in almost every way, and not entirely documented yet.

On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
> I have already read the API spesification. Honestly I do not understand
> how to use it. Because there are not an examples.

Re: How to perform queries on Cassandra?

Posted by Paul Prescod <pr...@gmail.com>.
Benjamin is pointing out that you must be using the word "username" to
mean something different than he is using it.

BY DEFINITION usernames are unique in the most common use of the word.
So what do you really mean if not "username"?

On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vi...@gmail.com> wrote:
> its not a problem its a scenario, which we need to handle. And all I am
> trying to do is to achieve what is not there with API i.e a workaroud.
>
> On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> A system that permits multiple people to have the same username has a
>> serious problem.
>>
>> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
>> wrote:
>> > How to handle same usernames. Otherwise seems fine to me.
>> >
>> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> As far as I can see it, the Cassandra API currently supports criterias
>> >> on:
>> >>
>> >> Token – Key – Super Column Name (if applicable) - Column Names
>> >>
>> >>
>> >>
>> >> I guess Token is not usually used for the day to day queries, so, Key
>> >> and
>> >> Column Names are normally used for querying. For the user name and
>> >> password
>> >> case, I guess it can be done like this:
>> >>
>> >>
>> >>
>> >> Define a CF as UserAuth with type as Super, and Key is user name, while
>> >> password can be the SuperKeyName. So, while you receive the user name
>> >> and
>> >> password from the UI (or any other methods), it can be queried via:
>> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> means
>> >> that the user name and password matches.
>> >>
>> >>
>> >>
>> >> If not using the super column name, and put the password as the column
>> >> name, the column name usually not used for these kind of discretionary
>> >> values (actually, I don’t see any definitive documents on how to use
>> >> the
>> >> column Names and Super Columns, flexibility is the good of Cassandra,
>> >> or is
>> >> it bad if abused? :P)
>> >>
>> >>
>> >>
>> >> Not sure whether this is the best way, but I guess it will work.
>> >>
>> >>
>> >>
>> >> Regards,
>> >>
>> >> Dop
>> >>
>> >>
>> >>
>> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: How to perform queries on Cassandra?
>> >>
>> >>
>> >>
>> >> Hi Benjamin
>> >>
>> >> I'll try to make it more clear to you.
>> >> We have a user table with fields 'id', 'username', and 'password'. Now
>> >> if
>> >> use the ideal way to store key/value, like :
>> >> username : vineetdaniel
>> >> timestamp
>> >> password : <password>
>> >> timestamp
>> >>
>> >> second user :
>> >>
>> >> username: <seconduser>
>> >> timestamp
>> >> password:<password>
>> >>
>> >> and so on, here what i assume is that as we cannot make search on
>> >> values
>> >> (as confirmed by guys on cassandra forums) we are not able to perform
>> >> robust
>> >> 'where' queries. Now what i propose is this.
>> >>
>> >> Rather than using a static values for column names use values itself
>> >> and
>> >> unique key as identifier. So, the above example when put in as per me
>> >> would
>> >> be.
>> >>
>> >> vineetdaniel : vineetdaniel
>> >> timestamp
>> >>
>> >> <password>:<password>
>> >> timestamp
>> >>
>> >> second user
>> >> seconduser:seconduser
>> >> timestamp
>> >>
>> >> password:password
>> >> timestamp
>> >>
>> >> By using above methodology we can simply make search on keys itself
>> >> rather
>> >> than going into using different CF's. But to add further, this cannot
>> >> be
>> >> used for every situation. I am still exploring this, and soon will be
>> >> updating the group and my blog with information pertaining to this. As
>> >> cassandra is new, I think every idea or experience should be shared
>> >> with the
>> >> community.
>> >>
>> >> I hope I example is clear this time. Should you have any queries feel
>> >> free
>> >> to revert.
>> >>
>> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> Sorry, I don't understand your example.
>> >>
>> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> <vi...@gmail.com> wrote:
>> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> > usernames,
>> >> > suppose if I am not using unique names as in email id's . If we have
>> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> > the
>> >> > solution. I think keeping incremental numeric id as key and keeping
>> >> > the
>> >> > name
>> >> > and value same in the column family.
>> >> >
>> >> > Example :
>> >> > User1 has password as 123456
>> >> >
>> >> > Cassandra structure :
>> >> >
>> >> > 1 as key
>> >> >            user1 - column name
>> >> >            value - user1
>> >> >            123456 - column name
>> >> >             value - 123456
>> >> >
>> >> > I m thinking of doing it this way for my applicaton, this way i can
>> >> > run
>> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >>
>> >> >> You would have a Column Family, not a column for that; let's call it
>> >> >> the Users CF.  You'd use username as the row key and have a column
>> >> >> called 'password'.  For your example query, you'd retrieve row key
>> >> >> 'usr2', column 'password'.  The general pattern is that you create
>> >> >> CFs
>> >> >> to act as indices for each query you want to perform.  There is no
>> >> >> equivalent to a relational store to perform arbitrary queries.  You
>> >> >> must structure things to permit the queries of interest.
>> >> >>
>> >> >>
>> >> >> b
>> >> >>
>> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
>> >> >> wrote:
>> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> > understand
>> >> >> > how to use it. Because there are not an examples.
>> >> >> >
>> >> >> > For example I have a column like this:
>> >> >> >
>> >> >> > UserName    Password
>> >> >> > usr1                abc
>> >> >> > usr2                xyz
>> >> >> > usr3                opm
>> >> >> >
>> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >
>> >> >> >       Select Password From Users Where UserName = "usr2";
>> >> >> >
>> >> >> > Now I want to get the password using OODBMS DB4o Object Query  and
>> >> >> > Java
>> >> >> >
>> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >> >      {
>> >> >> >             public boolean match(Users Myusers)
>> >> >> >             {
>> >> >> >                  return Myuser.getUserName() == "usr2";
>> >> >> >             }
>> >> >> >      });
>> >> >> >
>> >> >> > After we get the Users instance in the QueryResult, hence we can
>> >> >> > get
>> >> >> > the
>> >> >> > usr2's password.
>> >> >> >
>> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> > Would you tell me please??  Thank You.
>> >> >> >
>> >> >> > Dir.
>> >> >> >
>> >> >> >
>> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> No. Cassandra has an API.
>> >> >> >>
>> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >>
>> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> > RDBMS
>> >> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> > <ma...@treehousesystems.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
>> >> >> >> >> simple
>> >> >> >> >> approach would be calculate a rectangle that is X km by Y km
>> >> >> >> >> with
>> >> >> >> >> User
>> >> >> >> >> 1's
>> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> >> >> UserY-10KmY to
>> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >>
>> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> that
>> >> >> >> >> each
>> >> >> >> >> user
>> >> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
>> >> >> >> >> and
>> >> >> >> >> curUserY
>> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> Kilometers
>> >> >> >> >> to
>> >> >> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> >> >> search)
>> >> >> >> >>
>> >> >> >> >> With the right indexes this query actually runs pretty well.
>> >> >> >> >>
>> >> >> >> >> Translating that to Cassandra seems a bit complex at first -
>> >> >> >> >> but
>> >> >> >> >> you
>> >> >> >> >> could
>> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> resolution
>> >> >> >> >> (like a
>> >> >> >> >> square of 5KM per side) and assign every user to a particular
>> >> >> >> >> grid
>> >> >> >> >> ID.
>> >> >> >> >> That
>> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> direct
>> >> >> >> >> key
>> >> >> >> >> lookup
>> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >>
>> >> >> >> >> A second approach would be to have to column families -- one
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> a
>> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> second
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> users who are at a particular longitude.  You could do the
>> >> >> >> >> same
>> >> >> >> >> rectange
>> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> longitudes.
>> >> >> >> >> You would then need to do a in-memory nested loop to find the
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> that are in both lists.  This second approach could cause some
>> >> >> >> >> trouble
>> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> have
>> >> >> >> >> --
>> >> >> >> >> some
>> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >>
>> >> >> >> >> So, it seems some version of a chunking / grid id thing would
>> >> >> >> >> be
>> >> >> >> >> the
>> >> >> >> >> better approach.   If you let people zoom in or zoom out - you
>> >> >> >> >> could
>> >> >> >> >> just
>> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >>
>> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> {
>> >> >> >> >> Decimal f = 0.0M;
>> >> >> >> >> lat = Math.Abs(lat);
>> >> >> >> >> f = 68.99M;
>> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >> >>
>> >> >> >> >> return f;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >> >>
>> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >>
>> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> >> > ...
>> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> > calculations
>> >> >> >> >> > on
>> >> >> >> >> > the
>> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> > beginning.
>> >> >> >> >> > Lets say we have latitude and longitude coordinates of all
>> >> >> >> >> > users
>> >> >> >> >> > and
>> >> >> >> >> > we
>> >> >> >> >> > have
>> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
>> >> >> >> >> > which
>> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >>
>> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> "MapReduce"
>> >> >> >> >> and
>> >> >> >> >> "Hadoop".
>> >> >> >> >>
>> >> >> >> >> I don't think that there's any top-down tutorial on those two
>> >> >> >> >> words,
>> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >>
>> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >>
>> >> >> >> >>  * http://hadoop.apache.org/
>> >> >> >> >>
>> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >>
>> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >>
>> >> >> >> >>  Paul Prescod
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
On Sun, Apr 11, 2010 at 12:10 PM, vineet daniel <vi...@gmail.com> wrote:
> I assume that using the key i can get the all the columns like an array. Now
> i'd be using php to extract  arraykey=>value in that array, just want to
> avoid that i.e i can directly print the column names.

It doesn't work this way.  It's not an array, it's an ordered hash
sorted based on the compareWith setting for the CF.  For this reason,
you cannot do what you are suggesting and tell what is what.  Assuming
UTF8Type for comparisons, username 'vineet' and password 'foo'  in
your model is {'foo':'foo', 'vineet':'vineet'}, while username 'adam'
and password 'xylophone' is {'adam':'adam', 'xylophone':'xylophone'}.
Or is that username 'xylophone' with password 'adam'?  You could play
games with prepending strings to distinguish things, or you could just
use column names and indices properly and skip all this complexity.


b

Re: How to perform queries on Cassandra?

Posted by vineet daniel <vi...@gmail.com>.
I assume that using the key i can get the all the columns like an array. Now
i'd be using php to extract  arraykey=>value in that array, just want to
avoid that i.e i can directly print the column names. If you guys think its
not a good idea I can drop it, anyways m new to it and a lot of things are
coming to mind. As far as cassandra and columnfamily/ super columns are
concerned i am pretty clear.

On Mon, Apr 12, 2010 at 12:23 AM, Benjamin Black <b...@b3k.us> wrote:

> I have no idea what problem you are trying to solve.  You are
> misunderstanding a number of things about the Cassandra data model and
> about how we are explaining it is best used.
>
> On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel <vi...@gmail.com>
> wrote:
> > Well my initial idea is to use value  as column name, keeping key as an
> > incremental integer. The discussion after each mail has drifted from this
> > point which I had made. Will put it again.
> >
> > we want to store user information. We keep 1,2,3,4.....so on as keys. AND
> > values as column names i.e rather than using column name 'first name',
> i'd
> > be using 'vineet' as column name, rather than using 'last name' as column
> > name i'd be using 'daniel'. This way I can directly read column names as
> > values. This is just a thought that has come to my mind while trying to
> > design my db for cassandra.
> >
> >
> >
> > On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b...@b3k.us> wrote:
> >>
> >> Row keys must be unique.  If your usernames are not unique and you
> >> want to be able to query on them, you either need to figure out a way
> >> to make them unique or treat the username rows themselves as indices,
> >> which refer to a set of actually unique identifiers for users.
> >>
> >> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vineetdaniel@gmail.com
> >
> >> wrote:
> >> > its not a problem its a scenario, which we need to handle. And all I
> am
> >> > trying to do is to achieve what is not there with API i.e a workaroud.
> >> >
> >> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >>
> >> >> A system that permits multiple people to have the same username has a
> >> >> serious problem.
> >> >>
> >> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <
> vineetdaniel@gmail.com>
> >> >> wrote:
> >> >> > How to handle same usernames. Otherwise seems fine to me.
> >> >> >
> >> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> As far as I can see it, the Cassandra API currently supports
> >> >> >> criterias
> >> >> >> on:
> >> >> >>
> >> >> >> Token – Key – Super Column Name (if applicable) - Column Names
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I guess Token is not usually used for the day to day queries, so,
> >> >> >> Key
> >> >> >> and
> >> >> >> Column Names are normally used for querying. For the user name and
> >> >> >> password
> >> >> >> case, I guess it can be done like this:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Define a CF as UserAuth with type as Super, and Key is user name,
> >> >> >> while
> >> >> >> password can be the SuperKeyName. So, while you receive the user
> >> >> >> name
> >> >> >> and
> >> >> >> password from the UI (or any other methods), it can be queried
> via:
> >> >> >> multiget_slice or get_range_slices, if there are anything
> returned,
> >> >> >> means
> >> >> >> that the user name and password matches.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> If not using the super column name, and put the password as the
> >> >> >> column
> >> >> >> name, the column name usually not used for these kind of
> >> >> >> discretionary
> >> >> >> values (actually, I don’t see any definitive documents on how to
> use
> >> >> >> the
> >> >> >> column Names and Super Columns, flexibility is the good of
> >> >> >> Cassandra,
> >> >> >> or is
> >> >> >> it bad if abused? :P)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Not sure whether this is the best way, but I guess it will work.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> Dop
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
> >> >> >> Sent: Sunday, April 11, 2010 5:33 PM
> >> >> >> To: user@cassandra.apache.org
> >> >> >> Subject: Re: How to perform queries on Cassandra?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Hi Benjamin
> >> >> >>
> >> >> >> I'll try to make it more clear to you.
> >> >> >> We have a user table with fields 'id', 'username', and 'password'.
> >> >> >> Now
> >> >> >> if
> >> >> >> use the ideal way to store key/value, like :
> >> >> >> username : vineetdaniel
> >> >> >> timestamp
> >> >> >> password : <password>
> >> >> >> timestamp
> >> >> >>
> >> >> >> second user :
> >> >> >>
> >> >> >> username: <seconduser>
> >> >> >> timestamp
> >> >> >> password:<password>
> >> >> >>
> >> >> >> and so on, here what i assume is that as we cannot make search on
> >> >> >> values
> >> >> >> (as confirmed by guys on cassandra forums) we are not able to
> >> >> >> perform
> >> >> >> robust
> >> >> >> 'where' queries. Now what i propose is this.
> >> >> >>
> >> >> >> Rather than using a static values for column names use values
> itself
> >> >> >> and
> >> >> >> unique key as identifier. So, the above example when put in as per
> >> >> >> me
> >> >> >> would
> >> >> >> be.
> >> >> >>
> >> >> >> vineetdaniel : vineetdaniel
> >> >> >> timestamp
> >> >> >>
> >> >> >> <password>:<password>
> >> >> >> timestamp
> >> >> >>
> >> >> >> second user
> >> >> >> seconduser:seconduser
> >> >> >> timestamp
> >> >> >>
> >> >> >> password:password
> >> >> >> timestamp
> >> >> >>
> >> >> >> By using above methodology we can simply make search on keys
> itself
> >> >> >> rather
> >> >> >> than going into using different CF's. But to add further, this
> >> >> >> cannot
> >> >> >> be
> >> >> >> used for every situation. I am still exploring this, and soon will
> >> >> >> be
> >> >> >> updating the group and my blog with information pertaining to
> this.
> >> >> >> As
> >> >> >> cassandra is new, I think every idea or experience should be
> shared
> >> >> >> with the
> >> >> >> community.
> >> >> >>
> >> >> >> I hope I example is clear this time. Should you have any queries
> >> >> >> feel
> >> >> >> free
> >> >> >> to revert.
> >> >> >>
> >> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >> >>
> >> >> >> Sorry, I don't understand your example.
> >> >> >>
> >> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> >> >> >> <vi...@gmail.com> wrote:
> >> >> >> > Benjamin I quite agree to you, but what in case of duplicate
> >> >> >> > usernames,
> >> >> >> > suppose if I am not using unique names as in email id's . If we
> >> >> >> > have
> >> >> >> > duplicacy in usernames we cannot use it for key, so what should
> be
> >> >> >> > the
> >> >> >> > solution. I think keeping incremental numeric id as key and
> >> >> >> > keeping
> >> >> >> > the
> >> >> >> > name
> >> >> >> > and value same in the column family.
> >> >> >> >
> >> >> >> > Example :
> >> >> >> > User1 has password as 123456
> >> >> >> >
> >> >> >> > Cassandra structure :
> >> >> >> >
> >> >> >> > 1 as key
> >> >> >> >            user1 - column name
> >> >> >> >            value - user1
> >> >> >> >            123456 - column name
> >> >> >> >             value - 123456
> >> >> >> >
> >> >> >> > I m thinking of doing it this way for my applicaton, this way i
> >> >> >> > can
> >> >> >> > run
> >> >> >> > different sorts of queries too. Any feedback on this is welcome.
> >> >> >> >
> >> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us>
> wrote:
> >> >> >> >>
> >> >> >> >> You would have a Column Family, not a column for that; let's
> call
> >> >> >> >> it
> >> >> >> >> the Users CF.  You'd use username as the row key and have a
> >> >> >> >> column
> >> >> >> >> called 'password'.  For your example query, you'd retrieve row
> >> >> >> >> key
> >> >> >> >> 'usr2', column 'password'.  The general pattern is that you
> >> >> >> >> create
> >> >> >> >> CFs
> >> >> >> >> to act as indices for each query you want to perform.  There is
> >> >> >> >> no
> >> >> >> >> equivalent to a relational store to perform arbitrary queries.
> >> >> >> >>  You
> >> >> >> >> must structure things to permit the queries of interest.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> b
> >> >> >> >>
> >> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <
> sikerasakti@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > I have already read the API spesification. Honestly I do not
> >> >> >> >> > understand
> >> >> >> >> > how to use it. Because there are not an examples.
> >> >> >> >> >
> >> >> >> >> > For example I have a column like this:
> >> >> >> >> >
> >> >> >> >> > UserName    Password
> >> >> >> >> > usr1                abc
> >> >> >> >> > usr2                xyz
> >> >> >> >> > usr3                opm
> >> >> >> >> >
> >> >> >> >> > suppose I want query the user's password using SQL in RDBMS
> >> >> >> >> >
> >> >> >> >> >       Select Password From Users Where UserName = "usr2";
> >> >> >> >> >
> >> >> >> >> > Now I want to get the password using OODBMS DB4o Object Query
> >> >> >> >> > and
> >> >> >> >> > Java
> >> >> >> >> >
> >> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >> >> >> >      {
> >> >> >> >> >             public boolean match(Users Myusers)
> >> >> >> >> >             {
> >> >> >> >> >                  return Myuser.getUserName() == "usr2";
> >> >> >> >> >             }
> >> >> >> >> >      });
> >> >> >> >> >
> >> >> >> >> > After we get the Users instance in the QueryResult, hence we
> >> >> >> >> > can
> >> >> >> >> > get
> >> >> >> >> > the
> >> >> >> >> > usr2's password.
> >> >> >> >> >
> >> >> >> >> > How we perform this query using Cassandra API and Java??
> >> >> >> >> > Would you tell me please??  Thank You.
> >> >> >> >> >
> >> >> >> >> > Dir.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod
> >> >> >> >> > <pa...@prescod.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> No. Cassandra has an API.
> >> >> >> >> >>
> >> >> >> >> >> http://wiki.apache.org/cassandra/API
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir
> >> >> >> >> >> <si...@gmail.com>
> >> >> >> >> >> wrote:
> >> >> >> >> >> > Does Cassandra has a default query language such as SQL in
> >> >> >> >> >> > RDBMS
> >> >> >> >> >> > and Object Query in OODBMS?  Thank you.
> >> >> >> >> >> >
> >> >> >> >> >> > Dir.
> >> >> >> >> >> >
> >> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> >> >> >> > <ma...@treehousesystems.com>
> >> >> >> >> >> > wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> It's sort of an interesting problem - in RDBMS one
> >> >> >> >> >> >> relatively
> >> >> >> >> >> >> simple
> >> >> >> >> >> >> approach would be calculate a rectangle that is X km by Y
> >> >> >> >> >> >> km
> >> >> >> >> >> >> with
> >> >> >> >> >> >> User
> >> >> >> >> >> >> 1's
> >> >> >> >> >> >> location at the center.  So the rectangle is UserX -
> 10KmX
> >> >> >> >> >> >> ,
> >> >> >> >> >> >> UserY-10KmY to
> >> >> >> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >> >> >> >>
> >> >> >> >> >> >> Then you could query the database for all other users
> where
> >> >> >> >> >> >> that
> >> >> >> >> >> >> each
> >> >> >> >> >> >> user
> >> >> >> >> >> >> considered is curUserX > UserX-10Km and curUserX <
> >> >> >> >> >> >> UserX+10KmX
> >> >> >> >> >> >> and
> >> >> >> >> >> >> curUserY
> >> >> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
> >> >> >> >> >> >> Kilometers
> >> >> >> >> >> >> to
> >> >> >> >> >> >> degrees of  lat and longitude  (that you can find on a
> >> >> >> >> >> >> google
> >> >> >> >> >> >> search)
> >> >> >> >> >> >>
> >> >> >> >> >> >> With the right indexes this query actually runs pretty
> >> >> >> >> >> >> well.
> >> >> >> >> >> >>
> >> >> >> >> >> >> Translating that to Cassandra seems a bit complex at
> first
> >> >> >> >> >> >> -
> >> >> >> >> >> >> but
> >> >> >> >> >> >> you
> >> >> >> >> >> >> could
> >> >> >> >> >> >> try something like pre-calculating a grid with the right
> >> >> >> >> >> >> resolution
> >> >> >> >> >> >> (like a
> >> >> >> >> >> >> square of 5KM per side) and assign every user to a
> >> >> >> >> >> >> particular
> >> >> >> >> >> >> grid
> >> >> >> >> >> >> ID.
> >> >> >> >> >> >> That
> >> >> >> >> >> >> way you just calculate with grid ID User1 is in then do a
> >> >> >> >> >> >> direct
> >> >> >> >> >> >> key
> >> >> >> >> >> >> lookup
> >> >> >> >> >> >> to get a list of the users in that same grid id.
> >> >> >> >> >> >>
> >> >> >> >> >> >> A second approach would be to have to column families --
> >> >> >> >> >> >> one
> >> >> >> >> >> >> that
> >> >> >> >> >> >> maps
> >> >> >> >> >> >> a
> >> >> >> >> >> >> Latitude to a list of users who are at that latitude and
> a
> >> >> >> >> >> >> second
> >> >> >> >> >> >> that
> >> >> >> >> >> >> maps
> >> >> >> >> >> >> users who are at a particular longitude.  You could do
> the
> >> >> >> >> >> >> same
> >> >> >> >> >> >> rectange
> >> >> >> >> >> >> calculation above then do a get_slice range lookup to get
> a
> >> >> >> >> >> >> list
> >> >> >> >> >> >> of
> >> >> >> >> >> >> users
> >> >> >> >> >> >> from range of latitude and a second list from the range
> of
> >> >> >> >> >> >> longitudes.
> >> >> >> >> >> >> You would then need to do a in-memory nested loop to find
> >> >> >> >> >> >> the
> >> >> >> >> >> >> list
> >> >> >> >> >> >> of
> >> >> >> >> >> >> users
> >> >> >> >> >> >> that are in both lists.  This second approach could cause
> >> >> >> >> >> >> some
> >> >> >> >> >> >> trouble
> >> >> >> >> >> >> depending on where you search and how many users you
> really
> >> >> >> >> >> >> have
> >> >> >> >> >> >> --
> >> >> >> >> >> >> some
> >> >> >> >> >> >> latitudes and longitudes have many many people in them
> >> >> >> >> >> >>
> >> >> >> >> >> >> So, it seems some version of a chunking / grid id thing
> >> >> >> >> >> >> would
> >> >> >> >> >> >> be
> >> >> >> >> >> >> the
> >> >> >> >> >> >> better approach.   If you let people zoom in or zoom out
> -
> >> >> >> >> >> >> you
> >> >> >> >> >> >> could
> >> >> >> >> >> >> just
> >> >> >> >> >> >> have different column families for each level of zoom.
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> I'm stuck on a stopped train so -- here is even more
> code:
> >> >> >> >> >> >>
> >> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> >> >> >> >> {
> >> >> >> >> >> >> Decimal f = 0.0M;
> >> >> >> >> >> >> lat = Math.Abs(lat);
> >> >> >> >> >> >> f = 68.99M;
> >> >> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >> >> >> >> >>
> >> >> >> >> >> >> return f;
> >> >> >> >> >> >> }
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> Decimal MilesPerDegreeLatitude =
> >> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> >> >> >> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >> >> >> >> >>                         dRadius = 10.0M  // ten miles
> >> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >> >> >> >>
> >> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> >> >> >> >> > ...
> >> >> >> >> >> >> > I'm trying to find out how do you perform queries with
> >> >> >> >> >> >> > calculations
> >> >> >> >> >> >> > on
> >> >> >> >> >> >> > the
> >> >> >> >> >> >> > fly without inserting the data as calculated from the
> >> >> >> >> >> >> > beginning.
> >> >> >> >> >> >> > Lets say we have latitude and longitude coordinates of
> >> >> >> >> >> >> > all
> >> >> >> >> >> >> > users
> >> >> >> >> >> >> > and
> >> >> >> >> >> >> > we
> >> >> >> >> >> >> > have
> >> >> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long)
> function
> >> >> >> >> >> >> > which
> >> >> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >> >> >> >> >>
> >> >> >> >> >> >> I'm not an expert, but I think that it boils down to
> >> >> >> >> >> >> "MapReduce"
> >> >> >> >> >> >> and
> >> >> >> >> >> >> "Hadoop".
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think that there's any top-down tutorial on those
> >> >> >> >> >> >> two
> >> >> >> >> >> >> words,
> >> >> >> >> >> >> you'll have to research yourself starting here:
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://hadoop.apache.org/
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think it is all documented in any one place
> yet...
> >> >> >> >> >> >>
> >> >> >> >> >> >>  Paul Prescod
> >> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
I have no idea what problem you are trying to solve.  You are
misunderstanding a number of things about the Cassandra data model and
about how we are explaining it is best used.

On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel <vi...@gmail.com> wrote:
> Well my initial idea is to use value  as column name, keeping key as an
> incremental integer. The discussion after each mail has drifted from this
> point which I had made. Will put it again.
>
> we want to store user information. We keep 1,2,3,4.....so on as keys. AND
> values as column names i.e rather than using column name 'first name', i'd
> be using 'vineet' as column name, rather than using 'last name' as column
> name i'd be using 'daniel'. This way I can directly read column names as
> values. This is just a thought that has come to my mind while trying to
> design my db for cassandra.
>
>
>
> On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> Row keys must be unique.  If your usernames are not unique and you
>> want to be able to query on them, you either need to figure out a way
>> to make them unique or treat the username rows themselves as indices,
>> which refer to a set of actually unique identifiers for users.
>>
>> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vi...@gmail.com>
>> wrote:
>> > its not a problem its a scenario, which we need to handle. And all I am
>> > trying to do is to achieve what is not there with API i.e a workaroud.
>> >
>> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> A system that permits multiple people to have the same username has a
>> >> serious problem.
>> >>
>> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
>> >> wrote:
>> >> > How to handle same usernames. Otherwise seems fine to me.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >>
>> >> >> As far as I can see it, the Cassandra API currently supports
>> >> >> criterias
>> >> >> on:
>> >> >>
>> >> >> Token – Key – Super Column Name (if applicable) - Column Names
>> >> >>
>> >> >>
>> >> >>
>> >> >> I guess Token is not usually used for the day to day queries, so,
>> >> >> Key
>> >> >> and
>> >> >> Column Names are normally used for querying. For the user name and
>> >> >> password
>> >> >> case, I guess it can be done like this:
>> >> >>
>> >> >>
>> >> >>
>> >> >> Define a CF as UserAuth with type as Super, and Key is user name,
>> >> >> while
>> >> >> password can be the SuperKeyName. So, while you receive the user
>> >> >> name
>> >> >> and
>> >> >> password from the UI (or any other methods), it can be queried via:
>> >> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> >> means
>> >> >> that the user name and password matches.
>> >> >>
>> >> >>
>> >> >>
>> >> >> If not using the super column name, and put the password as the
>> >> >> column
>> >> >> name, the column name usually not used for these kind of
>> >> >> discretionary
>> >> >> values (actually, I don’t see any definitive documents on how to use
>> >> >> the
>> >> >> column Names and Super Columns, flexibility is the good of
>> >> >> Cassandra,
>> >> >> or is
>> >> >> it bad if abused? :P)
>> >> >>
>> >> >>
>> >> >>
>> >> >> Not sure whether this is the best way, but I guess it will work.
>> >> >>
>> >> >>
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> Dop
>> >> >>
>> >> >>
>> >> >>
>> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> >> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> >> To: user@cassandra.apache.org
>> >> >> Subject: Re: How to perform queries on Cassandra?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Hi Benjamin
>> >> >>
>> >> >> I'll try to make it more clear to you.
>> >> >> We have a user table with fields 'id', 'username', and 'password'.
>> >> >> Now
>> >> >> if
>> >> >> use the ideal way to store key/value, like :
>> >> >> username : vineetdaniel
>> >> >> timestamp
>> >> >> password : <password>
>> >> >> timestamp
>> >> >>
>> >> >> second user :
>> >> >>
>> >> >> username: <seconduser>
>> >> >> timestamp
>> >> >> password:<password>
>> >> >>
>> >> >> and so on, here what i assume is that as we cannot make search on
>> >> >> values
>> >> >> (as confirmed by guys on cassandra forums) we are not able to
>> >> >> perform
>> >> >> robust
>> >> >> 'where' queries. Now what i propose is this.
>> >> >>
>> >> >> Rather than using a static values for column names use values itself
>> >> >> and
>> >> >> unique key as identifier. So, the above example when put in as per
>> >> >> me
>> >> >> would
>> >> >> be.
>> >> >>
>> >> >> vineetdaniel : vineetdaniel
>> >> >> timestamp
>> >> >>
>> >> >> <password>:<password>
>> >> >> timestamp
>> >> >>
>> >> >> second user
>> >> >> seconduser:seconduser
>> >> >> timestamp
>> >> >>
>> >> >> password:password
>> >> >> timestamp
>> >> >>
>> >> >> By using above methodology we can simply make search on keys itself
>> >> >> rather
>> >> >> than going into using different CF's. But to add further, this
>> >> >> cannot
>> >> >> be
>> >> >> used for every situation. I am still exploring this, and soon will
>> >> >> be
>> >> >> updating the group and my blog with information pertaining to this.
>> >> >> As
>> >> >> cassandra is new, I think every idea or experience should be shared
>> >> >> with the
>> >> >> community.
>> >> >>
>> >> >> I hope I example is clear this time. Should you have any queries
>> >> >> feel
>> >> >> free
>> >> >> to revert.
>> >> >>
>> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >>
>> >> >> Sorry, I don't understand your example.
>> >> >>
>> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> >> <vi...@gmail.com> wrote:
>> >> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> >> > usernames,
>> >> >> > suppose if I am not using unique names as in email id's . If we
>> >> >> > have
>> >> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> >> > the
>> >> >> > solution. I think keeping incremental numeric id as key and
>> >> >> > keeping
>> >> >> > the
>> >> >> > name
>> >> >> > and value same in the column family.
>> >> >> >
>> >> >> > Example :
>> >> >> > User1 has password as 123456
>> >> >> >
>> >> >> > Cassandra structure :
>> >> >> >
>> >> >> > 1 as key
>> >> >> >            user1 - column name
>> >> >> >            value - user1
>> >> >> >            123456 - column name
>> >> >> >             value - 123456
>> >> >> >
>> >> >> > I m thinking of doing it this way for my applicaton, this way i
>> >> >> > can
>> >> >> > run
>> >> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >> >
>> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >> >>
>> >> >> >> You would have a Column Family, not a column for that; let's call
>> >> >> >> it
>> >> >> >> the Users CF.  You'd use username as the row key and have a
>> >> >> >> column
>> >> >> >> called 'password'.  For your example query, you'd retrieve row
>> >> >> >> key
>> >> >> >> 'usr2', column 'password'.  The general pattern is that you
>> >> >> >> create
>> >> >> >> CFs
>> >> >> >> to act as indices for each query you want to perform.  There is
>> >> >> >> no
>> >> >> >> equivalent to a relational store to perform arbitrary queries.
>> >> >> >>  You
>> >> >> >> must structure things to permit the queries of interest.
>> >> >> >>
>> >> >> >>
>> >> >> >> b
>> >> >> >>
>> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> >> > understand
>> >> >> >> > how to use it. Because there are not an examples.
>> >> >> >> >
>> >> >> >> > For example I have a column like this:
>> >> >> >> >
>> >> >> >> > UserName    Password
>> >> >> >> > usr1                abc
>> >> >> >> > usr2                xyz
>> >> >> >> > usr3                opm
>> >> >> >> >
>> >> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >> >
>> >> >> >> >       Select Password From Users Where UserName = "usr2";
>> >> >> >> >
>> >> >> >> > Now I want to get the password using OODBMS DB4o Object Query
>> >> >> >> > and
>> >> >> >> > Java
>> >> >> >> >
>> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >> >> >      {
>> >> >> >> >             public boolean match(Users Myusers)
>> >> >> >> >             {
>> >> >> >> >                  return Myuser.getUserName() == "usr2";
>> >> >> >> >             }
>> >> >> >> >      });
>> >> >> >> >
>> >> >> >> > After we get the Users instance in the QueryResult, hence we
>> >> >> >> > can
>> >> >> >> > get
>> >> >> >> > the
>> >> >> >> > usr2's password.
>> >> >> >> >
>> >> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> >> > Would you tell me please??  Thank You.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod
>> >> >> >> > <pa...@prescod.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> No. Cassandra has an API.
>> >> >> >> >>
>> >> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >> >>
>> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir
>> >> >> >> >> <si...@gmail.com>
>> >> >> >> >> wrote:
>> >> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> >> > RDBMS
>> >> >> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >> >> >
>> >> >> >> >> > Dir.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> >> > <ma...@treehousesystems.com>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> It's sort of an interesting problem - in RDBMS one
>> >> >> >> >> >> relatively
>> >> >> >> >> >> simple
>> >> >> >> >> >> approach would be calculate a rectangle that is X km by Y
>> >> >> >> >> >> km
>> >> >> >> >> >> with
>> >> >> >> >> >> User
>> >> >> >> >> >> 1's
>> >> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX
>> >> >> >> >> >> ,
>> >> >> >> >> >> UserY-10KmY to
>> >> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >> >>
>> >> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> >> that
>> >> >> >> >> >> each
>> >> >> >> >> >> user
>> >> >> >> >> >> considered is curUserX > UserX-10Km and curUserX <
>> >> >> >> >> >> UserX+10KmX
>> >> >> >> >> >> and
>> >> >> >> >> >> curUserY
>> >> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> >> Kilometers
>> >> >> >> >> >> to
>> >> >> >> >> >> degrees of  lat and longitude  (that you can find on a
>> >> >> >> >> >> google
>> >> >> >> >> >> search)
>> >> >> >> >> >>
>> >> >> >> >> >> With the right indexes this query actually runs pretty
>> >> >> >> >> >> well.
>> >> >> >> >> >>
>> >> >> >> >> >> Translating that to Cassandra seems a bit complex at first
>> >> >> >> >> >> -
>> >> >> >> >> >> but
>> >> >> >> >> >> you
>> >> >> >> >> >> could
>> >> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> >> resolution
>> >> >> >> >> >> (like a
>> >> >> >> >> >> square of 5KM per side) and assign every user to a
>> >> >> >> >> >> particular
>> >> >> >> >> >> grid
>> >> >> >> >> >> ID.
>> >> >> >> >> >> That
>> >> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> >> direct
>> >> >> >> >> >> key
>> >> >> >> >> >> lookup
>> >> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >> >>
>> >> >> >> >> >> A second approach would be to have to column families --
>> >> >> >> >> >> one
>> >> >> >> >> >> that
>> >> >> >> >> >> maps
>> >> >> >> >> >> a
>> >> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> >> second
>> >> >> >> >> >> that
>> >> >> >> >> >> maps
>> >> >> >> >> >> users who are at a particular longitude.  You could do the
>> >> >> >> >> >> same
>> >> >> >> >> >> rectange
>> >> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> >> list
>> >> >> >> >> >> of
>> >> >> >> >> >> users
>> >> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> >> longitudes.
>> >> >> >> >> >> You would then need to do a in-memory nested loop to find
>> >> >> >> >> >> the
>> >> >> >> >> >> list
>> >> >> >> >> >> of
>> >> >> >> >> >> users
>> >> >> >> >> >> that are in both lists.  This second approach could cause
>> >> >> >> >> >> some
>> >> >> >> >> >> trouble
>> >> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> >> have
>> >> >> >> >> >> --
>> >> >> >> >> >> some
>> >> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >> >>
>> >> >> >> >> >> So, it seems some version of a chunking / grid id thing
>> >> >> >> >> >> would
>> >> >> >> >> >> be
>> >> >> >> >> >> the
>> >> >> >> >> >> better approach.   If you let people zoom in or zoom out -
>> >> >> >> >> >> you
>> >> >> >> >> >> could
>> >> >> >> >> >> just
>> >> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >> >>
>> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> >> {
>> >> >> >> >> >> Decimal f = 0.0M;
>> >> >> >> >> >> lat = Math.Abs(lat);
>> >> >> >> >> >> f = 68.99M;
>> >> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >> >> >>
>> >> >> >> >> >> return f;
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >> >> >>
>> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> >> >> > ...
>> >> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> >> > calculations
>> >> >> >> >> >> > on
>> >> >> >> >> >> > the
>> >> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> >> > beginning.
>> >> >> >> >> >> > Lets say we have latitude and longitude coordinates of
>> >> >> >> >> >> > all
>> >> >> >> >> >> > users
>> >> >> >> >> >> > and
>> >> >> >> >> >> > we
>> >> >> >> >> >> > have
>> >> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
>> >> >> >> >> >> > which
>> >> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >> >>
>> >> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> >> "MapReduce"
>> >> >> >> >> >> and
>> >> >> >> >> >> "Hadoop".
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think that there's any top-down tutorial on those
>> >> >> >> >> >> two
>> >> >> >> >> >> words,
>> >> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://hadoop.apache.org/
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >> >>
>> >> >> >> >> >>  Paul Prescod
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by Paul Prescod <pr...@gmail.com>.
Why do you want to directly read column names as values, and what will
you put in the column values?

On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel <vi...@gmail.com> wrote:
> Well my initial idea is to use value  as column name, keeping key as an
> incremental integer. The discussion after each mail has drifted from this
> point which I had made. Will put it again.
>
> we want to store user information. We keep 1,2,3,4.....so on as keys. AND
> values as column names i.e rather than using column name 'first name', i'd
> be using 'vineet' as column name, rather than using 'last name' as column
> name i'd be using 'daniel'. This way I can directly read column names as
> values. This is just a thought that has come to my mind while trying to
> design my db for cassandra.
>
>
>
> On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> Row keys must be unique.  If your usernames are not unique and you
>> want to be able to query on them, you either need to figure out a way
>> to make them unique or treat the username rows themselves as indices,
>> which refer to a set of actually unique identifiers for users.
>>
>> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vi...@gmail.com>
>> wrote:
>> > its not a problem its a scenario, which we need to handle. And all I am
>> > trying to do is to achieve what is not there with API i.e a workaroud.
>> >
>> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> A system that permits multiple people to have the same username has a
>> >> serious problem.
>> >>
>> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
>> >> wrote:
>> >> > How to handle same usernames. Otherwise seems fine to me.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >>
>> >> >> As far as I can see it, the Cassandra API currently supports
>> >> >> criterias
>> >> >> on:
>> >> >>
>> >> >> Token – Key – Super Column Name (if applicable) - Column Names
>> >> >>
>> >> >>
>> >> >>
>> >> >> I guess Token is not usually used for the day to day queries, so,
>> >> >> Key
>> >> >> and
>> >> >> Column Names are normally used for querying. For the user name and
>> >> >> password
>> >> >> case, I guess it can be done like this:
>> >> >>
>> >> >>
>> >> >>
>> >> >> Define a CF as UserAuth with type as Super, and Key is user name,
>> >> >> while
>> >> >> password can be the SuperKeyName. So, while you receive the user
>> >> >> name
>> >> >> and
>> >> >> password from the UI (or any other methods), it can be queried via:
>> >> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> >> means
>> >> >> that the user name and password matches.
>> >> >>
>> >> >>
>> >> >>
>> >> >> If not using the super column name, and put the password as the
>> >> >> column
>> >> >> name, the column name usually not used for these kind of
>> >> >> discretionary
>> >> >> values (actually, I don’t see any definitive documents on how to use
>> >> >> the
>> >> >> column Names and Super Columns, flexibility is the good of
>> >> >> Cassandra,
>> >> >> or is
>> >> >> it bad if abused? :P)
>> >> >>
>> >> >>
>> >> >>
>> >> >> Not sure whether this is the best way, but I guess it will work.
>> >> >>
>> >> >>
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> Dop
>> >> >>
>> >> >>
>> >> >>
>> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> >> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> >> To: user@cassandra.apache.org
>> >> >> Subject: Re: How to perform queries on Cassandra?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Hi Benjamin
>> >> >>
>> >> >> I'll try to make it more clear to you.
>> >> >> We have a user table with fields 'id', 'username', and 'password'.
>> >> >> Now
>> >> >> if
>> >> >> use the ideal way to store key/value, like :
>> >> >> username : vineetdaniel
>> >> >> timestamp
>> >> >> password : <password>
>> >> >> timestamp
>> >> >>
>> >> >> second user :
>> >> >>
>> >> >> username: <seconduser>
>> >> >> timestamp
>> >> >> password:<password>
>> >> >>
>> >> >> and so on, here what i assume is that as we cannot make search on
>> >> >> values
>> >> >> (as confirmed by guys on cassandra forums) we are not able to
>> >> >> perform
>> >> >> robust
>> >> >> 'where' queries. Now what i propose is this.
>> >> >>
>> >> >> Rather than using a static values for column names use values itself
>> >> >> and
>> >> >> unique key as identifier. So, the above example when put in as per
>> >> >> me
>> >> >> would
>> >> >> be.
>> >> >>
>> >> >> vineetdaniel : vineetdaniel
>> >> >> timestamp
>> >> >>
>> >> >> <password>:<password>
>> >> >> timestamp
>> >> >>
>> >> >> second user
>> >> >> seconduser:seconduser
>> >> >> timestamp
>> >> >>
>> >> >> password:password
>> >> >> timestamp
>> >> >>
>> >> >> By using above methodology we can simply make search on keys itself
>> >> >> rather
>> >> >> than going into using different CF's. But to add further, this
>> >> >> cannot
>> >> >> be
>> >> >> used for every situation. I am still exploring this, and soon will
>> >> >> be
>> >> >> updating the group and my blog with information pertaining to this.
>> >> >> As
>> >> >> cassandra is new, I think every idea or experience should be shared
>> >> >> with the
>> >> >> community.
>> >> >>
>> >> >> I hope I example is clear this time. Should you have any queries
>> >> >> feel
>> >> >> free
>> >> >> to revert.
>> >> >>
>> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >>
>> >> >> Sorry, I don't understand your example.
>> >> >>
>> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> >> <vi...@gmail.com> wrote:
>> >> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> >> > usernames,
>> >> >> > suppose if I am not using unique names as in email id's . If we
>> >> >> > have
>> >> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> >> > the
>> >> >> > solution. I think keeping incremental numeric id as key and
>> >> >> > keeping
>> >> >> > the
>> >> >> > name
>> >> >> > and value same in the column family.
>> >> >> >
>> >> >> > Example :
>> >> >> > User1 has password as 123456
>> >> >> >
>> >> >> > Cassandra structure :
>> >> >> >
>> >> >> > 1 as key
>> >> >> >            user1 - column name
>> >> >> >            value - user1
>> >> >> >            123456 - column name
>> >> >> >             value - 123456
>> >> >> >
>> >> >> > I m thinking of doing it this way for my applicaton, this way i
>> >> >> > can
>> >> >> > run
>> >> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >> >
>> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >> >>
>> >> >> >> You would have a Column Family, not a column for that; let's call
>> >> >> >> it
>> >> >> >> the Users CF.  You'd use username as the row key and have a
>> >> >> >> column
>> >> >> >> called 'password'.  For your example query, you'd retrieve row
>> >> >> >> key
>> >> >> >> 'usr2', column 'password'.  The general pattern is that you
>> >> >> >> create
>> >> >> >> CFs
>> >> >> >> to act as indices for each query you want to perform.  There is
>> >> >> >> no
>> >> >> >> equivalent to a relational store to perform arbitrary queries.
>> >> >> >>  You
>> >> >> >> must structure things to permit the queries of interest.
>> >> >> >>
>> >> >> >>
>> >> >> >> b
>> >> >> >>
>> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> >> > understand
>> >> >> >> > how to use it. Because there are not an examples.
>> >> >> >> >
>> >> >> >> > For example I have a column like this:
>> >> >> >> >
>> >> >> >> > UserName    Password
>> >> >> >> > usr1                abc
>> >> >> >> > usr2                xyz
>> >> >> >> > usr3                opm
>> >> >> >> >
>> >> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >> >
>> >> >> >> >       Select Password From Users Where UserName = "usr2";
>> >> >> >> >
>> >> >> >> > Now I want to get the password using OODBMS DB4o Object Query
>> >> >> >> > and
>> >> >> >> > Java
>> >> >> >> >
>> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >> >> >      {
>> >> >> >> >             public boolean match(Users Myusers)
>> >> >> >> >             {
>> >> >> >> >                  return Myuser.getUserName() == "usr2";
>> >> >> >> >             }
>> >> >> >> >      });
>> >> >> >> >
>> >> >> >> > After we get the Users instance in the QueryResult, hence we
>> >> >> >> > can
>> >> >> >> > get
>> >> >> >> > the
>> >> >> >> > usr2's password.
>> >> >> >> >
>> >> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> >> > Would you tell me please??  Thank You.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod
>> >> >> >> > <pa...@prescod.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> No. Cassandra has an API.
>> >> >> >> >>
>> >> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >> >>
>> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir
>> >> >> >> >> <si...@gmail.com>
>> >> >> >> >> wrote:
>> >> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> >> > RDBMS
>> >> >> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >> >> >
>> >> >> >> >> > Dir.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> >> > <ma...@treehousesystems.com>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> It's sort of an interesting problem - in RDBMS one
>> >> >> >> >> >> relatively
>> >> >> >> >> >> simple
>> >> >> >> >> >> approach would be calculate a rectangle that is X km by Y
>> >> >> >> >> >> km
>> >> >> >> >> >> with
>> >> >> >> >> >> User
>> >> >> >> >> >> 1's
>> >> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX
>> >> >> >> >> >> ,
>> >> >> >> >> >> UserY-10KmY to
>> >> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >> >>
>> >> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> >> that
>> >> >> >> >> >> each
>> >> >> >> >> >> user
>> >> >> >> >> >> considered is curUserX > UserX-10Km and curUserX <
>> >> >> >> >> >> UserX+10KmX
>> >> >> >> >> >> and
>> >> >> >> >> >> curUserY
>> >> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> >> Kilometers
>> >> >> >> >> >> to
>> >> >> >> >> >> degrees of  lat and longitude  (that you can find on a
>> >> >> >> >> >> google
>> >> >> >> >> >> search)
>> >> >> >> >> >>
>> >> >> >> >> >> With the right indexes this query actually runs pretty
>> >> >> >> >> >> well.
>> >> >> >> >> >>
>> >> >> >> >> >> Translating that to Cassandra seems a bit complex at first
>> >> >> >> >> >> -
>> >> >> >> >> >> but
>> >> >> >> >> >> you
>> >> >> >> >> >> could
>> >> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> >> resolution
>> >> >> >> >> >> (like a
>> >> >> >> >> >> square of 5KM per side) and assign every user to a
>> >> >> >> >> >> particular
>> >> >> >> >> >> grid
>> >> >> >> >> >> ID.
>> >> >> >> >> >> That
>> >> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> >> direct
>> >> >> >> >> >> key
>> >> >> >> >> >> lookup
>> >> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >> >>
>> >> >> >> >> >> A second approach would be to have to column families --
>> >> >> >> >> >> one
>> >> >> >> >> >> that
>> >> >> >> >> >> maps
>> >> >> >> >> >> a
>> >> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> >> second
>> >> >> >> >> >> that
>> >> >> >> >> >> maps
>> >> >> >> >> >> users who are at a particular longitude.  You could do the
>> >> >> >> >> >> same
>> >> >> >> >> >> rectange
>> >> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> >> list
>> >> >> >> >> >> of
>> >> >> >> >> >> users
>> >> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> >> longitudes.
>> >> >> >> >> >> You would then need to do a in-memory nested loop to find
>> >> >> >> >> >> the
>> >> >> >> >> >> list
>> >> >> >> >> >> of
>> >> >> >> >> >> users
>> >> >> >> >> >> that are in both lists.  This second approach could cause
>> >> >> >> >> >> some
>> >> >> >> >> >> trouble
>> >> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> >> have
>> >> >> >> >> >> --
>> >> >> >> >> >> some
>> >> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >> >>
>> >> >> >> >> >> So, it seems some version of a chunking / grid id thing
>> >> >> >> >> >> would
>> >> >> >> >> >> be
>> >> >> >> >> >> the
>> >> >> >> >> >> better approach.   If you let people zoom in or zoom out -
>> >> >> >> >> >> you
>> >> >> >> >> >> could
>> >> >> >> >> >> just
>> >> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >> >>
>> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> >> {
>> >> >> >> >> >> Decimal f = 0.0M;
>> >> >> >> >> >> lat = Math.Abs(lat);
>> >> >> >> >> >> f = 68.99M;
>> >> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >> >> >>
>> >> >> >> >> >> return f;
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >> >> >>
>> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> >> >> > ...
>> >> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> >> > calculations
>> >> >> >> >> >> > on
>> >> >> >> >> >> > the
>> >> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> >> > beginning.
>> >> >> >> >> >> > Lets say we have latitude and longitude coordinates of
>> >> >> >> >> >> > all
>> >> >> >> >> >> > users
>> >> >> >> >> >> > and
>> >> >> >> >> >> > we
>> >> >> >> >> >> > have
>> >> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
>> >> >> >> >> >> > which
>> >> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >> >>
>> >> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> >> "MapReduce"
>> >> >> >> >> >> and
>> >> >> >> >> >> "Hadoop".
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think that there's any top-down tutorial on those
>> >> >> >> >> >> two
>> >> >> >> >> >> words,
>> >> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://hadoop.apache.org/
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >> >>
>> >> >> >> >> >>  Paul Prescod
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by vineet daniel <vi...@gmail.com>.
I am dropping the idea, dont want to irritate you guys more. I've got your
points.

On Mon, Apr 12, 2010 at 12:41 AM, Benjamin Black <b...@b3k.us> wrote:

> Just to be clear: do you understand we are saying you need to use
> multiple CFs to achieve the goal, not a single one?
>
> The Users CF would be indexed on a unique integer as you are saying
> you intend.  There is no point in having values as column names here,
> other than making things incredibly confusing.  Assume instead that
> you have a column called 'username' and a column called 'password'.
> In your model where usernames may be the same for different users, you
> would have data that looked like this:
>
> 0: {'username':'usr1', 'password':'woop'}
> 1: {'username':'usr2', 'password':'foo'}
> 2: {'username':'usr2', 'password':'bar'}
>
> The UsernameIndex CF would be indexed on usernames, giving a map from
> a username to the unique identifiers in the Users CF with that
> username:
>
> 'usr1': {0:0}
> 'usr2': {1:0, 2:0}
>
> Note that since we don't care about the values in the UsernameIndex,
> they are just set to 0.  You can stash data here, if you like, but it
> can mean more overhead in maintaining data synchronization between the
> raw data and the index data.  To perform your query on username
> 'usr2', you get 'usr2' from UsernameIndex CF, which gives you a set of
> ids, and you then get those ids (1 and 2) from the Users CF.
>
>
> b
>
>
> On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel <vi...@gmail.com>
> wrote:
> > Well my initial idea is to use value  as column name, keeping key as an
> > incremental integer. The discussion after each mail has drifted from this
> > point which I had made. Will put it again.
> >
> > we want to store user information. We keep 1,2,3,4.....so on as keys. AND
> > values as column names i.e rather than using column name 'first name',
> i'd
> > be using 'vineet' as column name, rather than using 'last name' as column
> > name i'd be using 'daniel'. This way I can directly read column names as
> > values. This is just a thought that has come to my mind while trying to
> > design my db for cassandra.
> >
> >
> >
> > On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b...@b3k.us> wrote:
> >>
> >> Row keys must be unique.  If your usernames are not unique and you
> >> want to be able to query on them, you either need to figure out a way
> >> to make them unique or treat the username rows themselves as indices,
> >> which refer to a set of actually unique identifiers for users.
> >>
> >> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vineetdaniel@gmail.com
> >
> >> wrote:
> >> > its not a problem its a scenario, which we need to handle. And all I
> am
> >> > trying to do is to achieve what is not there with API i.e a workaroud.
> >> >
> >> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >>
> >> >> A system that permits multiple people to have the same username has a
> >> >> serious problem.
> >> >>
> >> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <
> vineetdaniel@gmail.com>
> >> >> wrote:
> >> >> > How to handle same usernames. Otherwise seems fine to me.
> >> >> >
> >> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> As far as I can see it, the Cassandra API currently supports
> >> >> >> criterias
> >> >> >> on:
> >> >> >>
> >> >> >> Token – Key – Super Column Name (if applicable) - Column Names
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I guess Token is not usually used for the day to day queries, so,
> >> >> >> Key
> >> >> >> and
> >> >> >> Column Names are normally used for querying. For the user name and
> >> >> >> password
> >> >> >> case, I guess it can be done like this:
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Define a CF as UserAuth with type as Super, and Key is user name,
> >> >> >> while
> >> >> >> password can be the SuperKeyName. So, while you receive the user
> >> >> >> name
> >> >> >> and
> >> >> >> password from the UI (or any other methods), it can be queried
> via:
> >> >> >> multiget_slice or get_range_slices, if there are anything
> returned,
> >> >> >> means
> >> >> >> that the user name and password matches.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> If not using the super column name, and put the password as the
> >> >> >> column
> >> >> >> name, the column name usually not used for these kind of
> >> >> >> discretionary
> >> >> >> values (actually, I don’t see any definitive documents on how to
> use
> >> >> >> the
> >> >> >> column Names and Super Columns, flexibility is the good of
> >> >> >> Cassandra,
> >> >> >> or is
> >> >> >> it bad if abused? :P)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Not sure whether this is the best way, but I guess it will work.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> Dop
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
> >> >> >> Sent: Sunday, April 11, 2010 5:33 PM
> >> >> >> To: user@cassandra.apache.org
> >> >> >> Subject: Re: How to perform queries on Cassandra?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Hi Benjamin
> >> >> >>
> >> >> >> I'll try to make it more clear to you.
> >> >> >> We have a user table with fields 'id', 'username', and 'password'.
> >> >> >> Now
> >> >> >> if
> >> >> >> use the ideal way to store key/value, like :
> >> >> >> username : vineetdaniel
> >> >> >> timestamp
> >> >> >> password : <password>
> >> >> >> timestamp
> >> >> >>
> >> >> >> second user :
> >> >> >>
> >> >> >> username: <seconduser>
> >> >> >> timestamp
> >> >> >> password:<password>
> >> >> >>
> >> >> >> and so on, here what i assume is that as we cannot make search on
> >> >> >> values
> >> >> >> (as confirmed by guys on cassandra forums) we are not able to
> >> >> >> perform
> >> >> >> robust
> >> >> >> 'where' queries. Now what i propose is this.
> >> >> >>
> >> >> >> Rather than using a static values for column names use values
> itself
> >> >> >> and
> >> >> >> unique key as identifier. So, the above example when put in as per
> >> >> >> me
> >> >> >> would
> >> >> >> be.
> >> >> >>
> >> >> >> vineetdaniel : vineetdaniel
> >> >> >> timestamp
> >> >> >>
> >> >> >> <password>:<password>
> >> >> >> timestamp
> >> >> >>
> >> >> >> second user
> >> >> >> seconduser:seconduser
> >> >> >> timestamp
> >> >> >>
> >> >> >> password:password
> >> >> >> timestamp
> >> >> >>
> >> >> >> By using above methodology we can simply make search on keys
> itself
> >> >> >> rather
> >> >> >> than going into using different CF's. But to add further, this
> >> >> >> cannot
> >> >> >> be
> >> >> >> used for every situation. I am still exploring this, and soon will
> >> >> >> be
> >> >> >> updating the group and my blog with information pertaining to
> this.
> >> >> >> As
> >> >> >> cassandra is new, I think every idea or experience should be
> shared
> >> >> >> with the
> >> >> >> community.
> >> >> >>
> >> >> >> I hope I example is clear this time. Should you have any queries
> >> >> >> feel
> >> >> >> free
> >> >> >> to revert.
> >> >> >>
> >> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >> >>
> >> >> >> Sorry, I don't understand your example.
> >> >> >>
> >> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> >> >> >> <vi...@gmail.com> wrote:
> >> >> >> > Benjamin I quite agree to you, but what in case of duplicate
> >> >> >> > usernames,
> >> >> >> > suppose if I am not using unique names as in email id's . If we
> >> >> >> > have
> >> >> >> > duplicacy in usernames we cannot use it for key, so what should
> be
> >> >> >> > the
> >> >> >> > solution. I think keeping incremental numeric id as key and
> >> >> >> > keeping
> >> >> >> > the
> >> >> >> > name
> >> >> >> > and value same in the column family.
> >> >> >> >
> >> >> >> > Example :
> >> >> >> > User1 has password as 123456
> >> >> >> >
> >> >> >> > Cassandra structure :
> >> >> >> >
> >> >> >> > 1 as key
> >> >> >> >            user1 - column name
> >> >> >> >            value - user1
> >> >> >> >            123456 - column name
> >> >> >> >             value - 123456
> >> >> >> >
> >> >> >> > I m thinking of doing it this way for my applicaton, this way i
> >> >> >> > can
> >> >> >> > run
> >> >> >> > different sorts of queries too. Any feedback on this is welcome.
> >> >> >> >
> >> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us>
> wrote:
> >> >> >> >>
> >> >> >> >> You would have a Column Family, not a column for that; let's
> call
> >> >> >> >> it
> >> >> >> >> the Users CF.  You'd use username as the row key and have a
> >> >> >> >> column
> >> >> >> >> called 'password'.  For your example query, you'd retrieve row
> >> >> >> >> key
> >> >> >> >> 'usr2', column 'password'.  The general pattern is that you
> >> >> >> >> create
> >> >> >> >> CFs
> >> >> >> >> to act as indices for each query you want to perform.  There is
> >> >> >> >> no
> >> >> >> >> equivalent to a relational store to perform arbitrary queries.
> >> >> >> >>  You
> >> >> >> >> must structure things to permit the queries of interest.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> b
> >> >> >> >>
> >> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <
> sikerasakti@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > I have already read the API spesification. Honestly I do not
> >> >> >> >> > understand
> >> >> >> >> > how to use it. Because there are not an examples.
> >> >> >> >> >
> >> >> >> >> > For example I have a column like this:
> >> >> >> >> >
> >> >> >> >> > UserName    Password
> >> >> >> >> > usr1                abc
> >> >> >> >> > usr2                xyz
> >> >> >> >> > usr3                opm
> >> >> >> >> >
> >> >> >> >> > suppose I want query the user's password using SQL in RDBMS
> >> >> >> >> >
> >> >> >> >> >       Select Password From Users Where UserName = "usr2";
> >> >> >> >> >
> >> >> >> >> > Now I want to get the password using OODBMS DB4o Object Query
> >> >> >> >> > and
> >> >> >> >> > Java
> >> >> >> >> >
> >> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >> >> >> >      {
> >> >> >> >> >             public boolean match(Users Myusers)
> >> >> >> >> >             {
> >> >> >> >> >                  return Myuser.getUserName() == "usr2";
> >> >> >> >> >             }
> >> >> >> >> >      });
> >> >> >> >> >
> >> >> >> >> > After we get the Users instance in the QueryResult, hence we
> >> >> >> >> > can
> >> >> >> >> > get
> >> >> >> >> > the
> >> >> >> >> > usr2's password.
> >> >> >> >> >
> >> >> >> >> > How we perform this query using Cassandra API and Java??
> >> >> >> >> > Would you tell me please??  Thank You.
> >> >> >> >> >
> >> >> >> >> > Dir.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod
> >> >> >> >> > <pa...@prescod.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> No. Cassandra has an API.
> >> >> >> >> >>
> >> >> >> >> >> http://wiki.apache.org/cassandra/API
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir
> >> >> >> >> >> <si...@gmail.com>
> >> >> >> >> >> wrote:
> >> >> >> >> >> > Does Cassandra has a default query language such as SQL in
> >> >> >> >> >> > RDBMS
> >> >> >> >> >> > and Object Query in OODBMS?  Thank you.
> >> >> >> >> >> >
> >> >> >> >> >> > Dir.
> >> >> >> >> >> >
> >> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> >> >> >> > <ma...@treehousesystems.com>
> >> >> >> >> >> > wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> It's sort of an interesting problem - in RDBMS one
> >> >> >> >> >> >> relatively
> >> >> >> >> >> >> simple
> >> >> >> >> >> >> approach would be calculate a rectangle that is X km by Y
> >> >> >> >> >> >> km
> >> >> >> >> >> >> with
> >> >> >> >> >> >> User
> >> >> >> >> >> >> 1's
> >> >> >> >> >> >> location at the center.  So the rectangle is UserX -
> 10KmX
> >> >> >> >> >> >> ,
> >> >> >> >> >> >> UserY-10KmY to
> >> >> >> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >> >> >> >>
> >> >> >> >> >> >> Then you could query the database for all other users
> where
> >> >> >> >> >> >> that
> >> >> >> >> >> >> each
> >> >> >> >> >> >> user
> >> >> >> >> >> >> considered is curUserX > UserX-10Km and curUserX <
> >> >> >> >> >> >> UserX+10KmX
> >> >> >> >> >> >> and
> >> >> >> >> >> >> curUserY
> >> >> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
> >> >> >> >> >> >> Kilometers
> >> >> >> >> >> >> to
> >> >> >> >> >> >> degrees of  lat and longitude  (that you can find on a
> >> >> >> >> >> >> google
> >> >> >> >> >> >> search)
> >> >> >> >> >> >>
> >> >> >> >> >> >> With the right indexes this query actually runs pretty
> >> >> >> >> >> >> well.
> >> >> >> >> >> >>
> >> >> >> >> >> >> Translating that to Cassandra seems a bit complex at
> first
> >> >> >> >> >> >> -
> >> >> >> >> >> >> but
> >> >> >> >> >> >> you
> >> >> >> >> >> >> could
> >> >> >> >> >> >> try something like pre-calculating a grid with the right
> >> >> >> >> >> >> resolution
> >> >> >> >> >> >> (like a
> >> >> >> >> >> >> square of 5KM per side) and assign every user to a
> >> >> >> >> >> >> particular
> >> >> >> >> >> >> grid
> >> >> >> >> >> >> ID.
> >> >> >> >> >> >> That
> >> >> >> >> >> >> way you just calculate with grid ID User1 is in then do a
> >> >> >> >> >> >> direct
> >> >> >> >> >> >> key
> >> >> >> >> >> >> lookup
> >> >> >> >> >> >> to get a list of the users in that same grid id.
> >> >> >> >> >> >>
> >> >> >> >> >> >> A second approach would be to have to column families --
> >> >> >> >> >> >> one
> >> >> >> >> >> >> that
> >> >> >> >> >> >> maps
> >> >> >> >> >> >> a
> >> >> >> >> >> >> Latitude to a list of users who are at that latitude and
> a
> >> >> >> >> >> >> second
> >> >> >> >> >> >> that
> >> >> >> >> >> >> maps
> >> >> >> >> >> >> users who are at a particular longitude.  You could do
> the
> >> >> >> >> >> >> same
> >> >> >> >> >> >> rectange
> >> >> >> >> >> >> calculation above then do a get_slice range lookup to get
> a
> >> >> >> >> >> >> list
> >> >> >> >> >> >> of
> >> >> >> >> >> >> users
> >> >> >> >> >> >> from range of latitude and a second list from the range
> of
> >> >> >> >> >> >> longitudes.
> >> >> >> >> >> >> You would then need to do a in-memory nested loop to find
> >> >> >> >> >> >> the
> >> >> >> >> >> >> list
> >> >> >> >> >> >> of
> >> >> >> >> >> >> users
> >> >> >> >> >> >> that are in both lists.  This second approach could cause
> >> >> >> >> >> >> some
> >> >> >> >> >> >> trouble
> >> >> >> >> >> >> depending on where you search and how many users you
> really
> >> >> >> >> >> >> have
> >> >> >> >> >> >> --
> >> >> >> >> >> >> some
> >> >> >> >> >> >> latitudes and longitudes have many many people in them
> >> >> >> >> >> >>
> >> >> >> >> >> >> So, it seems some version of a chunking / grid id thing
> >> >> >> >> >> >> would
> >> >> >> >> >> >> be
> >> >> >> >> >> >> the
> >> >> >> >> >> >> better approach.   If you let people zoom in or zoom out
> -
> >> >> >> >> >> >> you
> >> >> >> >> >> >> could
> >> >> >> >> >> >> just
> >> >> >> >> >> >> have different column families for each level of zoom.
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> I'm stuck on a stopped train so -- here is even more
> code:
> >> >> >> >> >> >>
> >> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> >> >> >> >> {
> >> >> >> >> >> >> Decimal f = 0.0M;
> >> >> >> >> >> >> lat = Math.Abs(lat);
> >> >> >> >> >> >> f = 68.99M;
> >> >> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >> >> >> >> >>
> >> >> >> >> >> >> return f;
> >> >> >> >> >> >> }
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> Decimal MilesPerDegreeLatitude =
> >> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> >> >> >> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >> >> >> >> >>                         dRadius = 10.0M  // ten miles
> >> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >> >> >> >>
> >> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> >> >> >> >> > ...
> >> >> >> >> >> >> > I'm trying to find out how do you perform queries with
> >> >> >> >> >> >> > calculations
> >> >> >> >> >> >> > on
> >> >> >> >> >> >> > the
> >> >> >> >> >> >> > fly without inserting the data as calculated from the
> >> >> >> >> >> >> > beginning.
> >> >> >> >> >> >> > Lets say we have latitude and longitude coordinates of
> >> >> >> >> >> >> > all
> >> >> >> >> >> >> > users
> >> >> >> >> >> >> > and
> >> >> >> >> >> >> > we
> >> >> >> >> >> >> > have
> >> >> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long)
> function
> >> >> >> >> >> >> > which
> >> >> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >> >> >> >> >>
> >> >> >> >> >> >> I'm not an expert, but I think that it boils down to
> >> >> >> >> >> >> "MapReduce"
> >> >> >> >> >> >> and
> >> >> >> >> >> >> "Hadoop".
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think that there's any top-down tutorial on those
> >> >> >> >> >> >> two
> >> >> >> >> >> >> words,
> >> >> >> >> >> >> you'll have to research yourself starting here:
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://hadoop.apache.org/
> >> >> >> >> >> >>
> >> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >> >> >> >>
> >> >> >> >> >> >> I don't think it is all documented in any one place
> yet...
> >> >> >> >> >> >>
> >> >> >> >> >> >>  Paul Prescod
> >> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
Just to be clear: do you understand we are saying you need to use
multiple CFs to achieve the goal, not a single one?

The Users CF would be indexed on a unique integer as you are saying
you intend.  There is no point in having values as column names here,
other than making things incredibly confusing.  Assume instead that
you have a column called 'username' and a column called 'password'.
In your model where usernames may be the same for different users, you
would have data that looked like this:

0: {'username':'usr1', 'password':'woop'}
1: {'username':'usr2', 'password':'foo'}
2: {'username':'usr2', 'password':'bar'}

The UsernameIndex CF would be indexed on usernames, giving a map from
a username to the unique identifiers in the Users CF with that
username:

'usr1': {0:0}
'usr2': {1:0, 2:0}

Note that since we don't care about the values in the UsernameIndex,
they are just set to 0.  You can stash data here, if you like, but it
can mean more overhead in maintaining data synchronization between the
raw data and the index data.  To perform your query on username
'usr2', you get 'usr2' from UsernameIndex CF, which gives you a set of
ids, and you then get those ids (1 and 2) from the Users CF.


b


On Sun, Apr 11, 2010 at 11:37 AM, vineet daniel <vi...@gmail.com> wrote:
> Well my initial idea is to use value  as column name, keeping key as an
> incremental integer. The discussion after each mail has drifted from this
> point which I had made. Will put it again.
>
> we want to store user information. We keep 1,2,3,4.....so on as keys. AND
> values as column names i.e rather than using column name 'first name', i'd
> be using 'vineet' as column name, rather than using 'last name' as column
> name i'd be using 'daniel'. This way I can directly read column names as
> values. This is just a thought that has come to my mind while trying to
> design my db for cassandra.
>
>
>
> On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> Row keys must be unique.  If your usernames are not unique and you
>> want to be able to query on them, you either need to figure out a way
>> to make them unique or treat the username rows themselves as indices,
>> which refer to a set of actually unique identifiers for users.
>>
>> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vi...@gmail.com>
>> wrote:
>> > its not a problem its a scenario, which we need to handle. And all I am
>> > trying to do is to achieve what is not there with API i.e a workaroud.
>> >
>> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> A system that permits multiple people to have the same username has a
>> >> serious problem.
>> >>
>> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
>> >> wrote:
>> >> > How to handle same usernames. Otherwise seems fine to me.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >>
>> >> >> As far as I can see it, the Cassandra API currently supports
>> >> >> criterias
>> >> >> on:
>> >> >>
>> >> >> Token – Key – Super Column Name (if applicable) - Column Names
>> >> >>
>> >> >>
>> >> >>
>> >> >> I guess Token is not usually used for the day to day queries, so,
>> >> >> Key
>> >> >> and
>> >> >> Column Names are normally used for querying. For the user name and
>> >> >> password
>> >> >> case, I guess it can be done like this:
>> >> >>
>> >> >>
>> >> >>
>> >> >> Define a CF as UserAuth with type as Super, and Key is user name,
>> >> >> while
>> >> >> password can be the SuperKeyName. So, while you receive the user
>> >> >> name
>> >> >> and
>> >> >> password from the UI (or any other methods), it can be queried via:
>> >> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> >> means
>> >> >> that the user name and password matches.
>> >> >>
>> >> >>
>> >> >>
>> >> >> If not using the super column name, and put the password as the
>> >> >> column
>> >> >> name, the column name usually not used for these kind of
>> >> >> discretionary
>> >> >> values (actually, I don’t see any definitive documents on how to use
>> >> >> the
>> >> >> column Names and Super Columns, flexibility is the good of
>> >> >> Cassandra,
>> >> >> or is
>> >> >> it bad if abused? :P)
>> >> >>
>> >> >>
>> >> >>
>> >> >> Not sure whether this is the best way, but I guess it will work.
>> >> >>
>> >> >>
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> Dop
>> >> >>
>> >> >>
>> >> >>
>> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> >> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> >> To: user@cassandra.apache.org
>> >> >> Subject: Re: How to perform queries on Cassandra?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Hi Benjamin
>> >> >>
>> >> >> I'll try to make it more clear to you.
>> >> >> We have a user table with fields 'id', 'username', and 'password'.
>> >> >> Now
>> >> >> if
>> >> >> use the ideal way to store key/value, like :
>> >> >> username : vineetdaniel
>> >> >> timestamp
>> >> >> password : <password>
>> >> >> timestamp
>> >> >>
>> >> >> second user :
>> >> >>
>> >> >> username: <seconduser>
>> >> >> timestamp
>> >> >> password:<password>
>> >> >>
>> >> >> and so on, here what i assume is that as we cannot make search on
>> >> >> values
>> >> >> (as confirmed by guys on cassandra forums) we are not able to
>> >> >> perform
>> >> >> robust
>> >> >> 'where' queries. Now what i propose is this.
>> >> >>
>> >> >> Rather than using a static values for column names use values itself
>> >> >> and
>> >> >> unique key as identifier. So, the above example when put in as per
>> >> >> me
>> >> >> would
>> >> >> be.
>> >> >>
>> >> >> vineetdaniel : vineetdaniel
>> >> >> timestamp
>> >> >>
>> >> >> <password>:<password>
>> >> >> timestamp
>> >> >>
>> >> >> second user
>> >> >> seconduser:seconduser
>> >> >> timestamp
>> >> >>
>> >> >> password:password
>> >> >> timestamp
>> >> >>
>> >> >> By using above methodology we can simply make search on keys itself
>> >> >> rather
>> >> >> than going into using different CF's. But to add further, this
>> >> >> cannot
>> >> >> be
>> >> >> used for every situation. I am still exploring this, and soon will
>> >> >> be
>> >> >> updating the group and my blog with information pertaining to this.
>> >> >> As
>> >> >> cassandra is new, I think every idea or experience should be shared
>> >> >> with the
>> >> >> community.
>> >> >>
>> >> >> I hope I example is clear this time. Should you have any queries
>> >> >> feel
>> >> >> free
>> >> >> to revert.
>> >> >>
>> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >>
>> >> >> Sorry, I don't understand your example.
>> >> >>
>> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> >> <vi...@gmail.com> wrote:
>> >> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> >> > usernames,
>> >> >> > suppose if I am not using unique names as in email id's . If we
>> >> >> > have
>> >> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> >> > the
>> >> >> > solution. I think keeping incremental numeric id as key and
>> >> >> > keeping
>> >> >> > the
>> >> >> > name
>> >> >> > and value same in the column family.
>> >> >> >
>> >> >> > Example :
>> >> >> > User1 has password as 123456
>> >> >> >
>> >> >> > Cassandra structure :
>> >> >> >
>> >> >> > 1 as key
>> >> >> >            user1 - column name
>> >> >> >            value - user1
>> >> >> >            123456 - column name
>> >> >> >             value - 123456
>> >> >> >
>> >> >> > I m thinking of doing it this way for my applicaton, this way i
>> >> >> > can
>> >> >> > run
>> >> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >> >
>> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >> >>
>> >> >> >> You would have a Column Family, not a column for that; let's call
>> >> >> >> it
>> >> >> >> the Users CF.  You'd use username as the row key and have a
>> >> >> >> column
>> >> >> >> called 'password'.  For your example query, you'd retrieve row
>> >> >> >> key
>> >> >> >> 'usr2', column 'password'.  The general pattern is that you
>> >> >> >> create
>> >> >> >> CFs
>> >> >> >> to act as indices for each query you want to perform.  There is
>> >> >> >> no
>> >> >> >> equivalent to a relational store to perform arbitrary queries.
>> >> >> >>  You
>> >> >> >> must structure things to permit the queries of interest.
>> >> >> >>
>> >> >> >>
>> >> >> >> b
>> >> >> >>
>> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> >> > understand
>> >> >> >> > how to use it. Because there are not an examples.
>> >> >> >> >
>> >> >> >> > For example I have a column like this:
>> >> >> >> >
>> >> >> >> > UserName    Password
>> >> >> >> > usr1                abc
>> >> >> >> > usr2                xyz
>> >> >> >> > usr3                opm
>> >> >> >> >
>> >> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >> >
>> >> >> >> >       Select Password From Users Where UserName = "usr2";
>> >> >> >> >
>> >> >> >> > Now I want to get the password using OODBMS DB4o Object Query
>> >> >> >> > and
>> >> >> >> > Java
>> >> >> >> >
>> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >> >> >      {
>> >> >> >> >             public boolean match(Users Myusers)
>> >> >> >> >             {
>> >> >> >> >                  return Myuser.getUserName() == "usr2";
>> >> >> >> >             }
>> >> >> >> >      });
>> >> >> >> >
>> >> >> >> > After we get the Users instance in the QueryResult, hence we
>> >> >> >> > can
>> >> >> >> > get
>> >> >> >> > the
>> >> >> >> > usr2's password.
>> >> >> >> >
>> >> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> >> > Would you tell me please??  Thank You.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod
>> >> >> >> > <pa...@prescod.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> No. Cassandra has an API.
>> >> >> >> >>
>> >> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >> >>
>> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir
>> >> >> >> >> <si...@gmail.com>
>> >> >> >> >> wrote:
>> >> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> >> > RDBMS
>> >> >> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >> >> >
>> >> >> >> >> > Dir.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> >> > <ma...@treehousesystems.com>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> It's sort of an interesting problem - in RDBMS one
>> >> >> >> >> >> relatively
>> >> >> >> >> >> simple
>> >> >> >> >> >> approach would be calculate a rectangle that is X km by Y
>> >> >> >> >> >> km
>> >> >> >> >> >> with
>> >> >> >> >> >> User
>> >> >> >> >> >> 1's
>> >> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX
>> >> >> >> >> >> ,
>> >> >> >> >> >> UserY-10KmY to
>> >> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >> >>
>> >> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> >> that
>> >> >> >> >> >> each
>> >> >> >> >> >> user
>> >> >> >> >> >> considered is curUserX > UserX-10Km and curUserX <
>> >> >> >> >> >> UserX+10KmX
>> >> >> >> >> >> and
>> >> >> >> >> >> curUserY
>> >> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> >> Kilometers
>> >> >> >> >> >> to
>> >> >> >> >> >> degrees of  lat and longitude  (that you can find on a
>> >> >> >> >> >> google
>> >> >> >> >> >> search)
>> >> >> >> >> >>
>> >> >> >> >> >> With the right indexes this query actually runs pretty
>> >> >> >> >> >> well.
>> >> >> >> >> >>
>> >> >> >> >> >> Translating that to Cassandra seems a bit complex at first
>> >> >> >> >> >> -
>> >> >> >> >> >> but
>> >> >> >> >> >> you
>> >> >> >> >> >> could
>> >> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> >> resolution
>> >> >> >> >> >> (like a
>> >> >> >> >> >> square of 5KM per side) and assign every user to a
>> >> >> >> >> >> particular
>> >> >> >> >> >> grid
>> >> >> >> >> >> ID.
>> >> >> >> >> >> That
>> >> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> >> direct
>> >> >> >> >> >> key
>> >> >> >> >> >> lookup
>> >> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >> >>
>> >> >> >> >> >> A second approach would be to have to column families --
>> >> >> >> >> >> one
>> >> >> >> >> >> that
>> >> >> >> >> >> maps
>> >> >> >> >> >> a
>> >> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> >> second
>> >> >> >> >> >> that
>> >> >> >> >> >> maps
>> >> >> >> >> >> users who are at a particular longitude.  You could do the
>> >> >> >> >> >> same
>> >> >> >> >> >> rectange
>> >> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> >> list
>> >> >> >> >> >> of
>> >> >> >> >> >> users
>> >> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> >> longitudes.
>> >> >> >> >> >> You would then need to do a in-memory nested loop to find
>> >> >> >> >> >> the
>> >> >> >> >> >> list
>> >> >> >> >> >> of
>> >> >> >> >> >> users
>> >> >> >> >> >> that are in both lists.  This second approach could cause
>> >> >> >> >> >> some
>> >> >> >> >> >> trouble
>> >> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> >> have
>> >> >> >> >> >> --
>> >> >> >> >> >> some
>> >> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >> >>
>> >> >> >> >> >> So, it seems some version of a chunking / grid id thing
>> >> >> >> >> >> would
>> >> >> >> >> >> be
>> >> >> >> >> >> the
>> >> >> >> >> >> better approach.   If you let people zoom in or zoom out -
>> >> >> >> >> >> you
>> >> >> >> >> >> could
>> >> >> >> >> >> just
>> >> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >> >>
>> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> >> {
>> >> >> >> >> >> Decimal f = 0.0M;
>> >> >> >> >> >> lat = Math.Abs(lat);
>> >> >> >> >> >> f = 68.99M;
>> >> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >> >> >>
>> >> >> >> >> >> return f;
>> >> >> >> >> >> }
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >> >> >>
>> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> >> >> > ...
>> >> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> >> > calculations
>> >> >> >> >> >> > on
>> >> >> >> >> >> > the
>> >> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> >> > beginning.
>> >> >> >> >> >> > Lets say we have latitude and longitude coordinates of
>> >> >> >> >> >> > all
>> >> >> >> >> >> > users
>> >> >> >> >> >> > and
>> >> >> >> >> >> > we
>> >> >> >> >> >> > have
>> >> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
>> >> >> >> >> >> > which
>> >> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >> >>
>> >> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> >> "MapReduce"
>> >> >> >> >> >> and
>> >> >> >> >> >> "Hadoop".
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think that there's any top-down tutorial on those
>> >> >> >> >> >> two
>> >> >> >> >> >> words,
>> >> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://hadoop.apache.org/
>> >> >> >> >> >>
>> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >> >>
>> >> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >> >>
>> >> >> >> >> >>  Paul Prescod
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by vineet daniel <vi...@gmail.com>.
Well my initial idea is to use value  as column name, keeping key as an
incremental integer. The discussion after each mail has drifted from this
point which I had made. Will put it again.

we want to store user information. We keep 1,2,3,4.....so on as keys. AND
values as column names i.e rather than using column name 'first name', i'd
be using 'vineet' as column name, rather than using 'last name' as column
name i'd be using 'daniel'. This way I can directly read column names as
values. This is just a thought that has come to my mind while trying to
design my db for cassandra.



On Sun, Apr 11, 2010 at 11:46 PM, Benjamin Black <b...@b3k.us> wrote:

> Row keys must be unique.  If your usernames are not unique and you
> want to be able to query on them, you either need to figure out a way
> to make them unique or treat the username rows themselves as indices,
> which refer to a set of actually unique identifiers for users.
>
> On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vi...@gmail.com>
> wrote:
> > its not a problem its a scenario, which we need to handle. And all I am
> > trying to do is to achieve what is not there with API i.e a workaroud.
> >
> > On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
> >>
> >> A system that permits multiple people to have the same username has a
> >> serious problem.
> >>
> >> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
> >> wrote:
> >> > How to handle same usernames. Otherwise seems fine to me.
> >> >
> >> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >>
> >> >>
> >> >> As far as I can see it, the Cassandra API currently supports
> criterias
> >> >> on:
> >> >>
> >> >> Token – Key – Super Column Name (if applicable) - Column Names
> >> >>
> >> >>
> >> >>
> >> >> I guess Token is not usually used for the day to day queries, so, Key
> >> >> and
> >> >> Column Names are normally used for querying. For the user name and
> >> >> password
> >> >> case, I guess it can be done like this:
> >> >>
> >> >>
> >> >>
> >> >> Define a CF as UserAuth with type as Super, and Key is user name,
> while
> >> >> password can be the SuperKeyName. So, while you receive the user name
> >> >> and
> >> >> password from the UI (or any other methods), it can be queried via:
> >> >> multiget_slice or get_range_slices, if there are anything returned,
> >> >> means
> >> >> that the user name and password matches.
> >> >>
> >> >>
> >> >>
> >> >> If not using the super column name, and put the password as the
> column
> >> >> name, the column name usually not used for these kind of
> discretionary
> >> >> values (actually, I don’t see any definitive documents on how to use
> >> >> the
> >> >> column Names and Super Columns, flexibility is the good of Cassandra,
> >> >> or is
> >> >> it bad if abused? :P)
> >> >>
> >> >>
> >> >>
> >> >> Not sure whether this is the best way, but I guess it will work.
> >> >>
> >> >>
> >> >>
> >> >> Regards,
> >> >>
> >> >> Dop
> >> >>
> >> >>
> >> >>
> >> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
> >> >> Sent: Sunday, April 11, 2010 5:33 PM
> >> >> To: user@cassandra.apache.org
> >> >> Subject: Re: How to perform queries on Cassandra?
> >> >>
> >> >>
> >> >>
> >> >> Hi Benjamin
> >> >>
> >> >> I'll try to make it more clear to you.
> >> >> We have a user table with fields 'id', 'username', and 'password'.
> Now
> >> >> if
> >> >> use the ideal way to store key/value, like :
> >> >> username : vineetdaniel
> >> >> timestamp
> >> >> password : <password>
> >> >> timestamp
> >> >>
> >> >> second user :
> >> >>
> >> >> username: <seconduser>
> >> >> timestamp
> >> >> password:<password>
> >> >>
> >> >> and so on, here what i assume is that as we cannot make search on
> >> >> values
> >> >> (as confirmed by guys on cassandra forums) we are not able to perform
> >> >> robust
> >> >> 'where' queries. Now what i propose is this.
> >> >>
> >> >> Rather than using a static values for column names use values itself
> >> >> and
> >> >> unique key as identifier. So, the above example when put in as per me
> >> >> would
> >> >> be.
> >> >>
> >> >> vineetdaniel : vineetdaniel
> >> >> timestamp
> >> >>
> >> >> <password>:<password>
> >> >> timestamp
> >> >>
> >> >> second user
> >> >> seconduser:seconduser
> >> >> timestamp
> >> >>
> >> >> password:password
> >> >> timestamp
> >> >>
> >> >> By using above methodology we can simply make search on keys itself
> >> >> rather
> >> >> than going into using different CF's. But to add further, this cannot
> >> >> be
> >> >> used for every situation. I am still exploring this, and soon will be
> >> >> updating the group and my blog with information pertaining to this.
> As
> >> >> cassandra is new, I think every idea or experience should be shared
> >> >> with the
> >> >> community.
> >> >>
> >> >> I hope I example is clear this time. Should you have any queries feel
> >> >> free
> >> >> to revert.
> >> >>
> >> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >>
> >> >> Sorry, I don't understand your example.
> >> >>
> >> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> >> >> <vi...@gmail.com> wrote:
> >> >> > Benjamin I quite agree to you, but what in case of duplicate
> >> >> > usernames,
> >> >> > suppose if I am not using unique names as in email id's . If we
> have
> >> >> > duplicacy in usernames we cannot use it for key, so what should be
> >> >> > the
> >> >> > solution. I think keeping incremental numeric id as key and keeping
> >> >> > the
> >> >> > name
> >> >> > and value same in the column family.
> >> >> >
> >> >> > Example :
> >> >> > User1 has password as 123456
> >> >> >
> >> >> > Cassandra structure :
> >> >> >
> >> >> > 1 as key
> >> >> >            user1 - column name
> >> >> >            value - user1
> >> >> >            123456 - column name
> >> >> >             value - 123456
> >> >> >
> >> >> > I m thinking of doing it this way for my applicaton, this way i can
> >> >> > run
> >> >> > different sorts of queries too. Any feedback on this is welcome.
> >> >> >
> >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >> >>
> >> >> >> You would have a Column Family, not a column for that; let's call
> it
> >> >> >> the Users CF.  You'd use username as the row key and have a column
> >> >> >> called 'password'.  For your example query, you'd retrieve row key
> >> >> >> 'usr2', column 'password'.  The general pattern is that you create
> >> >> >> CFs
> >> >> >> to act as indices for each query you want to perform.  There is no
> >> >> >> equivalent to a relational store to perform arbitrary queries.
>  You
> >> >> >> must structure things to permit the queries of interest.
> >> >> >>
> >> >> >>
> >> >> >> b
> >> >> >>
> >> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
> >> >> >> wrote:
> >> >> >> > I have already read the API spesification. Honestly I do not
> >> >> >> > understand
> >> >> >> > how to use it. Because there are not an examples.
> >> >> >> >
> >> >> >> > For example I have a column like this:
> >> >> >> >
> >> >> >> > UserName    Password
> >> >> >> > usr1                abc
> >> >> >> > usr2                xyz
> >> >> >> > usr3                opm
> >> >> >> >
> >> >> >> > suppose I want query the user's password using SQL in RDBMS
> >> >> >> >
> >> >> >> >       Select Password From Users Where UserName = "usr2";
> >> >> >> >
> >> >> >> > Now I want to get the password using OODBMS DB4o Object Query
> and
> >> >> >> > Java
> >> >> >> >
> >> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >> >> >      {
> >> >> >> >             public boolean match(Users Myusers)
> >> >> >> >             {
> >> >> >> >                  return Myuser.getUserName() == "usr2";
> >> >> >> >             }
> >> >> >> >      });
> >> >> >> >
> >> >> >> > After we get the Users instance in the QueryResult, hence we can
> >> >> >> > get
> >> >> >> > the
> >> >> >> > usr2's password.
> >> >> >> >
> >> >> >> > How we perform this query using Cassandra API and Java??
> >> >> >> > Would you tell me please??  Thank You.
> >> >> >> >
> >> >> >> > Dir.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <
> paul@prescod.net>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> No. Cassandra has an API.
> >> >> >> >>
> >> >> >> >> http://wiki.apache.org/cassandra/API
> >> >> >> >>
> >> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <sikerasakti@gmail.com
> >
> >> >> >> >> wrote:
> >> >> >> >> > Does Cassandra has a default query language such as SQL in
> >> >> >> >> > RDBMS
> >> >> >> >> > and Object Query in OODBMS?  Thank you.
> >> >> >> >> >
> >> >> >> >> > Dir.
> >> >> >> >> >
> >> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> >> >> > <ma...@treehousesystems.com>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> It's sort of an interesting problem - in RDBMS one
> relatively
> >> >> >> >> >> simple
> >> >> >> >> >> approach would be calculate a rectangle that is X km by Y km
> >> >> >> >> >> with
> >> >> >> >> >> User
> >> >> >> >> >> 1's
> >> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
> >> >> >> >> >> UserY-10KmY to
> >> >> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >> >> >>
> >> >> >> >> >> Then you could query the database for all other users where
> >> >> >> >> >> that
> >> >> >> >> >> each
> >> >> >> >> >> user
> >> >> >> >> >> considered is curUserX > UserX-10Km and curUserX <
> UserX+10KmX
> >> >> >> >> >> and
> >> >> >> >> >> curUserY
> >> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
> >> >> >> >> >> Kilometers
> >> >> >> >> >> to
> >> >> >> >> >> degrees of  lat and longitude  (that you can find on a
> google
> >> >> >> >> >> search)
> >> >> >> >> >>
> >> >> >> >> >> With the right indexes this query actually runs pretty well.
> >> >> >> >> >>
> >> >> >> >> >> Translating that to Cassandra seems a bit complex at first -
> >> >> >> >> >> but
> >> >> >> >> >> you
> >> >> >> >> >> could
> >> >> >> >> >> try something like pre-calculating a grid with the right
> >> >> >> >> >> resolution
> >> >> >> >> >> (like a
> >> >> >> >> >> square of 5KM per side) and assign every user to a
> particular
> >> >> >> >> >> grid
> >> >> >> >> >> ID.
> >> >> >> >> >> That
> >> >> >> >> >> way you just calculate with grid ID User1 is in then do a
> >> >> >> >> >> direct
> >> >> >> >> >> key
> >> >> >> >> >> lookup
> >> >> >> >> >> to get a list of the users in that same grid id.
> >> >> >> >> >>
> >> >> >> >> >> A second approach would be to have to column families -- one
> >> >> >> >> >> that
> >> >> >> >> >> maps
> >> >> >> >> >> a
> >> >> >> >> >> Latitude to a list of users who are at that latitude and a
> >> >> >> >> >> second
> >> >> >> >> >> that
> >> >> >> >> >> maps
> >> >> >> >> >> users who are at a particular longitude.  You could do the
> >> >> >> >> >> same
> >> >> >> >> >> rectange
> >> >> >> >> >> calculation above then do a get_slice range lookup to get a
> >> >> >> >> >> list
> >> >> >> >> >> of
> >> >> >> >> >> users
> >> >> >> >> >> from range of latitude and a second list from the range of
> >> >> >> >> >> longitudes.
> >> >> >> >> >> You would then need to do a in-memory nested loop to find
> the
> >> >> >> >> >> list
> >> >> >> >> >> of
> >> >> >> >> >> users
> >> >> >> >> >> that are in both lists.  This second approach could cause
> some
> >> >> >> >> >> trouble
> >> >> >> >> >> depending on where you search and how many users you really
> >> >> >> >> >> have
> >> >> >> >> >> --
> >> >> >> >> >> some
> >> >> >> >> >> latitudes and longitudes have many many people in them
> >> >> >> >> >>
> >> >> >> >> >> So, it seems some version of a chunking / grid id thing
> would
> >> >> >> >> >> be
> >> >> >> >> >> the
> >> >> >> >> >> better approach.   If you let people zoom in or zoom out -
> you
> >> >> >> >> >> could
> >> >> >> >> >> just
> >> >> >> >> >> have different column families for each level of zoom.
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
> >> >> >> >> >>
> >> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> >> >> >> {
> >> >> >> >> >> Decimal f = 0.0M;
> >> >> >> >> >> lat = Math.Abs(lat);
> >> >> >> >> >> f = 68.99M;
> >> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >> >> >> >>
> >> >> >> >> >> return f;
> >> >> >> >> >> }
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> Decimal MilesPerDegreeLatitude =
> >> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> >> >> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >> >> >> >>                         dRadius = 10.0M  // ten miles
> >> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >> >> >>
> >> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >> >> >> >>
> >> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> >> >> >> > ...
> >> >> >> >> >> > I'm trying to find out how do you perform queries with
> >> >> >> >> >> > calculations
> >> >> >> >> >> > on
> >> >> >> >> >> > the
> >> >> >> >> >> > fly without inserting the data as calculated from the
> >> >> >> >> >> > beginning.
> >> >> >> >> >> > Lets say we have latitude and longitude coordinates of all
> >> >> >> >> >> > users
> >> >> >> >> >> > and
> >> >> >> >> >> > we
> >> >> >> >> >> > have
> >> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
> >> >> >> >> >> > which
> >> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >> >> >> >>
> >> >> >> >> >> I'm not an expert, but I think that it boils down to
> >> >> >> >> >> "MapReduce"
> >> >> >> >> >> and
> >> >> >> >> >> "Hadoop".
> >> >> >> >> >>
> >> >> >> >> >> I don't think that there's any top-down tutorial on those
> two
> >> >> >> >> >> words,
> >> >> >> >> >> you'll have to research yourself starting here:
> >> >> >> >> >>
> >> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >> >> >>
> >> >> >> >> >>  * http://hadoop.apache.org/
> >> >> >> >> >>
> >> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >> >> >>
> >> >> >> >> >> I don't think it is all documented in any one place yet...
> >> >> >> >> >>
> >> >> >> >> >>  Paul Prescod
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >
> >
> >
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
Row keys must be unique.  If your usernames are not unique and you
want to be able to query on them, you either need to figure out a way
to make them unique or treat the username rows themselves as indices,
which refer to a set of actually unique identifiers for users.

On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vi...@gmail.com> wrote:
> its not a problem its a scenario, which we need to handle. And all I am
> trying to do is to achieve what is not there with API i.e a workaroud.
>
> On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> A system that permits multiple people to have the same username has a
>> serious problem.
>>
>> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
>> wrote:
>> > How to handle same usernames. Otherwise seems fine to me.
>> >
>> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> As far as I can see it, the Cassandra API currently supports criterias
>> >> on:
>> >>
>> >> Token – Key – Super Column Name (if applicable) - Column Names
>> >>
>> >>
>> >>
>> >> I guess Token is not usually used for the day to day queries, so, Key
>> >> and
>> >> Column Names are normally used for querying. For the user name and
>> >> password
>> >> case, I guess it can be done like this:
>> >>
>> >>
>> >>
>> >> Define a CF as UserAuth with type as Super, and Key is user name, while
>> >> password can be the SuperKeyName. So, while you receive the user name
>> >> and
>> >> password from the UI (or any other methods), it can be queried via:
>> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> means
>> >> that the user name and password matches.
>> >>
>> >>
>> >>
>> >> If not using the super column name, and put the password as the column
>> >> name, the column name usually not used for these kind of discretionary
>> >> values (actually, I don’t see any definitive documents on how to use
>> >> the
>> >> column Names and Super Columns, flexibility is the good of Cassandra,
>> >> or is
>> >> it bad if abused? :P)
>> >>
>> >>
>> >>
>> >> Not sure whether this is the best way, but I guess it will work.
>> >>
>> >>
>> >>
>> >> Regards,
>> >>
>> >> Dop
>> >>
>> >>
>> >>
>> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: How to perform queries on Cassandra?
>> >>
>> >>
>> >>
>> >> Hi Benjamin
>> >>
>> >> I'll try to make it more clear to you.
>> >> We have a user table with fields 'id', 'username', and 'password'. Now
>> >> if
>> >> use the ideal way to store key/value, like :
>> >> username : vineetdaniel
>> >> timestamp
>> >> password : <password>
>> >> timestamp
>> >>
>> >> second user :
>> >>
>> >> username: <seconduser>
>> >> timestamp
>> >> password:<password>
>> >>
>> >> and so on, here what i assume is that as we cannot make search on
>> >> values
>> >> (as confirmed by guys on cassandra forums) we are not able to perform
>> >> robust
>> >> 'where' queries. Now what i propose is this.
>> >>
>> >> Rather than using a static values for column names use values itself
>> >> and
>> >> unique key as identifier. So, the above example when put in as per me
>> >> would
>> >> be.
>> >>
>> >> vineetdaniel : vineetdaniel
>> >> timestamp
>> >>
>> >> <password>:<password>
>> >> timestamp
>> >>
>> >> second user
>> >> seconduser:seconduser
>> >> timestamp
>> >>
>> >> password:password
>> >> timestamp
>> >>
>> >> By using above methodology we can simply make search on keys itself
>> >> rather
>> >> than going into using different CF's. But to add further, this cannot
>> >> be
>> >> used for every situation. I am still exploring this, and soon will be
>> >> updating the group and my blog with information pertaining to this. As
>> >> cassandra is new, I think every idea or experience should be shared
>> >> with the
>> >> community.
>> >>
>> >> I hope I example is clear this time. Should you have any queries feel
>> >> free
>> >> to revert.
>> >>
>> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> Sorry, I don't understand your example.
>> >>
>> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> <vi...@gmail.com> wrote:
>> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> > usernames,
>> >> > suppose if I am not using unique names as in email id's . If we have
>> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> > the
>> >> > solution. I think keeping incremental numeric id as key and keeping
>> >> > the
>> >> > name
>> >> > and value same in the column family.
>> >> >
>> >> > Example :
>> >> > User1 has password as 123456
>> >> >
>> >> > Cassandra structure :
>> >> >
>> >> > 1 as key
>> >> >            user1 - column name
>> >> >            value - user1
>> >> >            123456 - column name
>> >> >             value - 123456
>> >> >
>> >> > I m thinking of doing it this way for my applicaton, this way i can
>> >> > run
>> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >>
>> >> >> You would have a Column Family, not a column for that; let's call it
>> >> >> the Users CF.  You'd use username as the row key and have a column
>> >> >> called 'password'.  For your example query, you'd retrieve row key
>> >> >> 'usr2', column 'password'.  The general pattern is that you create
>> >> >> CFs
>> >> >> to act as indices for each query you want to perform.  There is no
>> >> >> equivalent to a relational store to perform arbitrary queries.  You
>> >> >> must structure things to permit the queries of interest.
>> >> >>
>> >> >>
>> >> >> b
>> >> >>
>> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
>> >> >> wrote:
>> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> > understand
>> >> >> > how to use it. Because there are not an examples.
>> >> >> >
>> >> >> > For example I have a column like this:
>> >> >> >
>> >> >> > UserName    Password
>> >> >> > usr1                abc
>> >> >> > usr2                xyz
>> >> >> > usr3                opm
>> >> >> >
>> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >
>> >> >> >       Select Password From Users Where UserName = "usr2";
>> >> >> >
>> >> >> > Now I want to get the password using OODBMS DB4o Object Query  and
>> >> >> > Java
>> >> >> >
>> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >> >      {
>> >> >> >             public boolean match(Users Myusers)
>> >> >> >             {
>> >> >> >                  return Myuser.getUserName() == "usr2";
>> >> >> >             }
>> >> >> >      });
>> >> >> >
>> >> >> > After we get the Users instance in the QueryResult, hence we can
>> >> >> > get
>> >> >> > the
>> >> >> > usr2's password.
>> >> >> >
>> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> > Would you tell me please??  Thank You.
>> >> >> >
>> >> >> > Dir.
>> >> >> >
>> >> >> >
>> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> No. Cassandra has an API.
>> >> >> >>
>> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >>
>> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> > RDBMS
>> >> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> > <ma...@treehousesystems.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
>> >> >> >> >> simple
>> >> >> >> >> approach would be calculate a rectangle that is X km by Y km
>> >> >> >> >> with
>> >> >> >> >> User
>> >> >> >> >> 1's
>> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> >> >> UserY-10KmY to
>> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >>
>> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> that
>> >> >> >> >> each
>> >> >> >> >> user
>> >> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
>> >> >> >> >> and
>> >> >> >> >> curUserY
>> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> Kilometers
>> >> >> >> >> to
>> >> >> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> >> >> search)
>> >> >> >> >>
>> >> >> >> >> With the right indexes this query actually runs pretty well.
>> >> >> >> >>
>> >> >> >> >> Translating that to Cassandra seems a bit complex at first -
>> >> >> >> >> but
>> >> >> >> >> you
>> >> >> >> >> could
>> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> resolution
>> >> >> >> >> (like a
>> >> >> >> >> square of 5KM per side) and assign every user to a particular
>> >> >> >> >> grid
>> >> >> >> >> ID.
>> >> >> >> >> That
>> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> direct
>> >> >> >> >> key
>> >> >> >> >> lookup
>> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >>
>> >> >> >> >> A second approach would be to have to column families -- one
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> a
>> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> second
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> users who are at a particular longitude.  You could do the
>> >> >> >> >> same
>> >> >> >> >> rectange
>> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> longitudes.
>> >> >> >> >> You would then need to do a in-memory nested loop to find the
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> that are in both lists.  This second approach could cause some
>> >> >> >> >> trouble
>> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> have
>> >> >> >> >> --
>> >> >> >> >> some
>> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >>
>> >> >> >> >> So, it seems some version of a chunking / grid id thing would
>> >> >> >> >> be
>> >> >> >> >> the
>> >> >> >> >> better approach.   If you let people zoom in or zoom out - you
>> >> >> >> >> could
>> >> >> >> >> just
>> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >>
>> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> {
>> >> >> >> >> Decimal f = 0.0M;
>> >> >> >> >> lat = Math.Abs(lat);
>> >> >> >> >> f = 68.99M;
>> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >> >>
>> >> >> >> >> return f;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >> >>
>> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >>
>> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> >> > ...
>> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> > calculations
>> >> >> >> >> > on
>> >> >> >> >> > the
>> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> > beginning.
>> >> >> >> >> > Lets say we have latitude and longitude coordinates of all
>> >> >> >> >> > users
>> >> >> >> >> > and
>> >> >> >> >> > we
>> >> >> >> >> > have
>> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
>> >> >> >> >> > which
>> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >>
>> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> "MapReduce"
>> >> >> >> >> and
>> >> >> >> >> "Hadoop".
>> >> >> >> >>
>> >> >> >> >> I don't think that there's any top-down tutorial on those two
>> >> >> >> >> words,
>> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >>
>> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >>
>> >> >> >> >>  * http://hadoop.apache.org/
>> >> >> >> >>
>> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >>
>> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >>
>> >> >> >> >>  Paul Prescod
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by vineet daniel <vi...@gmail.com>.
its not a problem its a scenario, which we need to handle. And all I am
trying to do is to achieve what is not there with API i.e a workaroud.

On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:

> A system that permits multiple people to have the same username has a
> serious problem.
>
> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com>
> wrote:
> > How to handle same usernames. Otherwise seems fine to me.
> >
> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >>
> >> As far as I can see it, the Cassandra API currently supports criterias
> on:
> >>
> >> Token – Key – Super Column Name (if applicable) - Column Names
> >>
> >>
> >>
> >> I guess Token is not usually used for the day to day queries, so, Key
> and
> >> Column Names are normally used for querying. For the user name and
> password
> >> case, I guess it can be done like this:
> >>
> >>
> >>
> >> Define a CF as UserAuth with type as Super, and Key is user name, while
> >> password can be the SuperKeyName. So, while you receive the user name
> and
> >> password from the UI (or any other methods), it can be queried via:
> >> multiget_slice or get_range_slices, if there are anything returned,
> means
> >> that the user name and password matches.
> >>
> >>
> >>
> >> If not using the super column name, and put the password as the column
> >> name, the column name usually not used for these kind of discretionary
> >> values (actually, I don’t see any definitive documents on how to use the
> >> column Names and Super Columns, flexibility is the good of Cassandra, or
> is
> >> it bad if abused? :P)
> >>
> >>
> >>
> >> Not sure whether this is the best way, but I guess it will work.
> >>
> >>
> >>
> >> Regards,
> >>
> >> Dop
> >>
> >>
> >>
> >> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
> >> Sent: Sunday, April 11, 2010 5:33 PM
> >> To: user@cassandra.apache.org
> >> Subject: Re: How to perform queries on Cassandra?
> >>
> >>
> >>
> >> Hi Benjamin
> >>
> >> I'll try to make it more clear to you.
> >> We have a user table with fields 'id', 'username', and 'password'. Now
> if
> >> use the ideal way to store key/value, like :
> >> username : vineetdaniel
> >> timestamp
> >> password : <password>
> >> timestamp
> >>
> >> second user :
> >>
> >> username: <seconduser>
> >> timestamp
> >> password:<password>
> >>
> >> and so on, here what i assume is that as we cannot make search on values
> >> (as confirmed by guys on cassandra forums) we are not able to perform
> robust
> >> 'where' queries. Now what i propose is this.
> >>
> >> Rather than using a static values for column names use values itself and
> >> unique key as identifier. So, the above example when put in as per me
> would
> >> be.
> >>
> >> vineetdaniel : vineetdaniel
> >> timestamp
> >>
> >> <password>:<password>
> >> timestamp
> >>
> >> second user
> >> seconduser:seconduser
> >> timestamp
> >>
> >> password:password
> >> timestamp
> >>
> >> By using above methodology we can simply make search on keys itself
> rather
> >> than going into using different CF's. But to add further, this cannot be
> >> used for every situation. I am still exploring this, and soon will be
> >> updating the group and my blog with information pertaining to this. As
> >> cassandra is new, I think every idea or experience should be shared with
> the
> >> community.
> >>
> >> I hope I example is clear this time. Should you have any queries feel
> free
> >> to revert.
> >>
> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
> >>
> >> Sorry, I don't understand your example.
> >>
> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> >> <vi...@gmail.com> wrote:
> >> > Benjamin I quite agree to you, but what in case of duplicate
> usernames,
> >> > suppose if I am not using unique names as in email id's . If we have
> >> > duplicacy in usernames we cannot use it for key, so what should be the
> >> > solution. I think keeping incremental numeric id as key and keeping
> the
> >> > name
> >> > and value same in the column family.
> >> >
> >> > Example :
> >> > User1 has password as 123456
> >> >
> >> > Cassandra structure :
> >> >
> >> > 1 as key
> >> >            user1 - column name
> >> >            value - user1
> >> >            123456 - column name
> >> >             value - 123456
> >> >
> >> > I m thinking of doing it this way for my applicaton, this way i can
> run
> >> > different sorts of queries too. Any feedback on this is welcome.
> >> >
> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
> >> >>
> >> >> You would have a Column Family, not a column for that; let's call it
> >> >> the Users CF.  You'd use username as the row key and have a column
> >> >> called 'password'.  For your example query, you'd retrieve row key
> >> >> 'usr2', column 'password'.  The general pattern is that you create
> CFs
> >> >> to act as indices for each query you want to perform.  There is no
> >> >> equivalent to a relational store to perform arbitrary queries.  You
> >> >> must structure things to permit the queries of interest.
> >> >>
> >> >>
> >> >> b
> >> >>
> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com>
> wrote:
> >> >> > I have already read the API spesification. Honestly I do not
> >> >> > understand
> >> >> > how to use it. Because there are not an examples.
> >> >> >
> >> >> > For example I have a column like this:
> >> >> >
> >> >> > UserName    Password
> >> >> > usr1                abc
> >> >> > usr2                xyz
> >> >> > usr3                opm
> >> >> >
> >> >> > suppose I want query the user's password using SQL in RDBMS
> >> >> >
> >> >> >       Select Password From Users Where UserName = "usr2";
> >> >> >
> >> >> > Now I want to get the password using OODBMS DB4o Object Query  and
> >> >> > Java
> >> >> >
> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >> >      {
> >> >> >             public boolean match(Users Myusers)
> >> >> >             {
> >> >> >                  return Myuser.getUserName() == "usr2";
> >> >> >             }
> >> >> >      });
> >> >> >
> >> >> > After we get the Users instance in the QueryResult, hence we can
> get
> >> >> > the
> >> >> > usr2's password.
> >> >> >
> >> >> > How we perform this query using Cassandra API and Java??
> >> >> > Would you tell me please??  Thank You.
> >> >> >
> >> >> > Dir.
> >> >> >
> >> >> >
> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
> >> >> > wrote:
> >> >> >>
> >> >> >> No. Cassandra has an API.
> >> >> >>
> >> >> >> http://wiki.apache.org/cassandra/API
> >> >> >>
> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
> >> >> >> wrote:
> >> >> >> > Does Cassandra has a default query language such as SQL in RDBMS
> >> >> >> > and Object Query in OODBMS?  Thank you.
> >> >> >> >
> >> >> >> > Dir.
> >> >> >> >
> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> >> > <ma...@treehousesystems.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
> >> >> >> >> simple
> >> >> >> >> approach would be calculate a rectangle that is X km by Y km
> with
> >> >> >> >> User
> >> >> >> >> 1's
> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
> >> >> >> >> UserY-10KmY to
> >> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >> >>
> >> >> >> >> Then you could query the database for all other users where
> that
> >> >> >> >> each
> >> >> >> >> user
> >> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
> >> >> >> >> and
> >> >> >> >> curUserY
> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
> >> >> >> >> Kilometers
> >> >> >> >> to
> >> >> >> >> degrees of  lat and longitude  (that you can find on a google
> >> >> >> >> search)
> >> >> >> >>
> >> >> >> >> With the right indexes this query actually runs pretty well.
> >> >> >> >>
> >> >> >> >> Translating that to Cassandra seems a bit complex at first -
> but
> >> >> >> >> you
> >> >> >> >> could
> >> >> >> >> try something like pre-calculating a grid with the right
> >> >> >> >> resolution
> >> >> >> >> (like a
> >> >> >> >> square of 5KM per side) and assign every user to a particular
> >> >> >> >> grid
> >> >> >> >> ID.
> >> >> >> >> That
> >> >> >> >> way you just calculate with grid ID User1 is in then do a
> direct
> >> >> >> >> key
> >> >> >> >> lookup
> >> >> >> >> to get a list of the users in that same grid id.
> >> >> >> >>
> >> >> >> >> A second approach would be to have to column families -- one
> that
> >> >> >> >> maps
> >> >> >> >> a
> >> >> >> >> Latitude to a list of users who are at that latitude and a
> second
> >> >> >> >> that
> >> >> >> >> maps
> >> >> >> >> users who are at a particular longitude.  You could do the same
> >> >> >> >> rectange
> >> >> >> >> calculation above then do a get_slice range lookup to get a
> list
> >> >> >> >> of
> >> >> >> >> users
> >> >> >> >> from range of latitude and a second list from the range of
> >> >> >> >> longitudes.
> >> >> >> >> You would then need to do a in-memory nested loop to find the
> >> >> >> >> list
> >> >> >> >> of
> >> >> >> >> users
> >> >> >> >> that are in both lists.  This second approach could cause some
> >> >> >> >> trouble
> >> >> >> >> depending on where you search and how many users you really
> have
> >> >> >> >> --
> >> >> >> >> some
> >> >> >> >> latitudes and longitudes have many many people in them
> >> >> >> >>
> >> >> >> >> So, it seems some version of a chunking / grid id thing would
> be
> >> >> >> >> the
> >> >> >> >> better approach.   If you let people zoom in or zoom out - you
> >> >> >> >> could
> >> >> >> >> just
> >> >> >> >> have different column families for each level of zoom.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
> >> >> >> >>
> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> >> >> {
> >> >> >> >> Decimal f = 0.0M;
> >> >> >> >> lat = Math.Abs(lat);
> >> >> >> >> f = 68.99M;
> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >> >> >>
> >> >> >> >> return f;
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Decimal MilesPerDegreeLatitude =
> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> >> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >> >> >>                         dRadius = 10.0M  // ten miles
> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >> >>
> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >> >> >>
> >> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> >> >> > ...
> >> >> >> >> > I'm trying to find out how do you perform queries with
> >> >> >> >> > calculations
> >> >> >> >> > on
> >> >> >> >> > the
> >> >> >> >> > fly without inserting the data as calculated from the
> >> >> >> >> > beginning.
> >> >> >> >> > Lets say we have latitude and longitude coordinates of all
> >> >> >> >> > users
> >> >> >> >> > and
> >> >> >> >> > we
> >> >> >> >> > have
> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
> which
> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >> >> >>
> >> >> >> >> I'm not an expert, but I think that it boils down to
> "MapReduce"
> >> >> >> >> and
> >> >> >> >> "Hadoop".
> >> >> >> >>
> >> >> >> >> I don't think that there's any top-down tutorial on those two
> >> >> >> >> words,
> >> >> >> >> you'll have to research yourself starting here:
> >> >> >> >>
> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >> >>
> >> >> >> >>  * http://hadoop.apache.org/
> >> >> >> >>
> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >> >>
> >> >> >> >> I don't think it is all documented in any one place yet...
> >> >> >> >>
> >> >> >> >>  Paul Prescod
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >>
> >>
> >
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
A system that permits multiple people to have the same username has a
serious problem.

On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vi...@gmail.com> wrote:
> How to handle same usernames. Otherwise seems fine to me.
>
> On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>>
>> Hi,
>>
>>
>>
>> As far as I can see it, the Cassandra API currently supports criterias on:
>>
>> Token – Key – Super Column Name (if applicable) - Column Names
>>
>>
>>
>> I guess Token is not usually used for the day to day queries, so, Key and
>> Column Names are normally used for querying. For the user name and password
>> case, I guess it can be done like this:
>>
>>
>>
>> Define a CF as UserAuth with type as Super, and Key is user name, while
>> password can be the SuperKeyName. So, while you receive the user name and
>> password from the UI (or any other methods), it can be queried via:
>> multiget_slice or get_range_slices, if there are anything returned, means
>> that the user name and password matches.
>>
>>
>>
>> If not using the super column name, and put the password as the column
>> name, the column name usually not used for these kind of discretionary
>> values (actually, I don’t see any definitive documents on how to use the
>> column Names and Super Columns, flexibility is the good of Cassandra, or is
>> it bad if abused? :P)
>>
>>
>>
>> Not sure whether this is the best way, but I guess it will work.
>>
>>
>>
>> Regards,
>>
>> Dop
>>
>>
>>
>> From: Lucifer Dignified [mailto:vineetdaniel@gmail.com]
>> Sent: Sunday, April 11, 2010 5:33 PM
>> To: user@cassandra.apache.org
>> Subject: Re: How to perform queries on Cassandra?
>>
>>
>>
>> Hi Benjamin
>>
>> I'll try to make it more clear to you.
>> We have a user table with fields 'id', 'username', and 'password'. Now if
>> use the ideal way to store key/value, like :
>> username : vineetdaniel
>> timestamp
>> password : <password>
>> timestamp
>>
>> second user :
>>
>> username: <seconduser>
>> timestamp
>> password:<password>
>>
>> and so on, here what i assume is that as we cannot make search on values
>> (as confirmed by guys on cassandra forums) we are not able to perform robust
>> 'where' queries. Now what i propose is this.
>>
>> Rather than using a static values for column names use values itself and
>> unique key as identifier. So, the above example when put in as per me would
>> be.
>>
>> vineetdaniel : vineetdaniel
>> timestamp
>>
>> <password>:<password>
>> timestamp
>>
>> second user
>> seconduser:seconduser
>> timestamp
>>
>> password:password
>> timestamp
>>
>> By using above methodology we can simply make search on keys itself rather
>> than going into using different CF's. But to add further, this cannot be
>> used for every situation. I am still exploring this, and soon will be
>> updating the group and my blog with information pertaining to this. As
>> cassandra is new, I think every idea or experience should be shared with the
>> community.
>>
>> I hope I example is clear this time. Should you have any queries feel free
>> to revert.
>>
>> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> Sorry, I don't understand your example.
>>
>> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> <vi...@gmail.com> wrote:
>> > Benjamin I quite agree to you, but what in case of duplicate usernames,
>> > suppose if I am not using unique names as in email id's . If we have
>> > duplicacy in usernames we cannot use it for key, so what should be the
>> > solution. I think keeping incremental numeric id as key and keeping the
>> > name
>> > and value same in the column family.
>> >
>> > Example :
>> > User1 has password as 123456
>> >
>> > Cassandra structure :
>> >
>> > 1 as key
>> >            user1 - column name
>> >            value - user1
>> >            123456 - column name
>> >             value - 123456
>> >
>> > I m thinking of doing it this way for my applicaton, this way i can run
>> > different sorts of queries too. Any feedback on this is welcome.
>> >
>> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> You would have a Column Family, not a column for that; let's call it
>> >> the Users CF.  You'd use username as the row key and have a column
>> >> called 'password'.  For your example query, you'd retrieve row key
>> >> 'usr2', column 'password'.  The general pattern is that you create CFs
>> >> to act as indices for each query you want to perform.  There is no
>> >> equivalent to a relational store to perform arbitrary queries.  You
>> >> must structure things to permit the queries of interest.
>> >>
>> >>
>> >> b
>> >>
>> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
>> >> > I have already read the API spesification. Honestly I do not
>> >> > understand
>> >> > how to use it. Because there are not an examples.
>> >> >
>> >> > For example I have a column like this:
>> >> >
>> >> > UserName    Password
>> >> > usr1                abc
>> >> > usr2                xyz
>> >> > usr3                opm
>> >> >
>> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >
>> >> >       Select Password From Users Where UserName = "usr2";
>> >> >
>> >> > Now I want to get the password using OODBMS DB4o Object Query  and
>> >> > Java
>> >> >
>> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >      {
>> >> >             public boolean match(Users Myusers)
>> >> >             {
>> >> >                  return Myuser.getUserName() == "usr2";
>> >> >             }
>> >> >      });
>> >> >
>> >> > After we get the Users instance in the QueryResult, hence we can get
>> >> > the
>> >> > usr2's password.
>> >> >
>> >> > How we perform this query using Cassandra API and Java??
>> >> > Would you tell me please??  Thank You.
>> >> >
>> >> > Dir.
>> >> >
>> >> >
>> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
>> >> > wrote:
>> >> >>
>> >> >> No. Cassandra has an API.
>> >> >>
>> >> >> http://wiki.apache.org/cassandra/API
>> >> >>
>> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
>> >> >> wrote:
>> >> >> > Does Cassandra has a default query language such as SQL in RDBMS
>> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >
>> >> >> > Dir.
>> >> >> >
>> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> > <ma...@treehousesystems.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
>> >> >> >> simple
>> >> >> >> approach would be calculate a rectangle that is X km by Y km with
>> >> >> >> User
>> >> >> >> 1's
>> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> >> UserY-10KmY to
>> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >>
>> >> >> >> Then you could query the database for all other users where that
>> >> >> >> each
>> >> >> >> user
>> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
>> >> >> >> and
>> >> >> >> curUserY
>> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> Kilometers
>> >> >> >> to
>> >> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> >> search)
>> >> >> >>
>> >> >> >> With the right indexes this query actually runs pretty well.
>> >> >> >>
>> >> >> >> Translating that to Cassandra seems a bit complex at first - but
>> >> >> >> you
>> >> >> >> could
>> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> resolution
>> >> >> >> (like a
>> >> >> >> square of 5KM per side) and assign every user to a particular
>> >> >> >> grid
>> >> >> >> ID.
>> >> >> >> That
>> >> >> >> way you just calculate with grid ID User1 is in then do a direct
>> >> >> >> key
>> >> >> >> lookup
>> >> >> >> to get a list of the users in that same grid id.
>> >> >> >>
>> >> >> >> A second approach would be to have to column families -- one that
>> >> >> >> maps
>> >> >> >> a
>> >> >> >> Latitude to a list of users who are at that latitude and a second
>> >> >> >> that
>> >> >> >> maps
>> >> >> >> users who are at a particular longitude.  You could do the same
>> >> >> >> rectange
>> >> >> >> calculation above then do a get_slice range lookup to get a list
>> >> >> >> of
>> >> >> >> users
>> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> longitudes.
>> >> >> >> You would then need to do a in-memory nested loop to find the
>> >> >> >> list
>> >> >> >> of
>> >> >> >> users
>> >> >> >> that are in both lists.  This second approach could cause some
>> >> >> >> trouble
>> >> >> >> depending on where you search and how many users you really have
>> >> >> >> --
>> >> >> >> some
>> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >>
>> >> >> >> So, it seems some version of a chunking / grid id thing would be
>> >> >> >> the
>> >> >> >> better approach.   If you let people zoom in or zoom out - you
>> >> >> >> could
>> >> >> >> just
>> >> >> >> have different column families for each level of zoom.
>> >> >> >>
>> >> >> >>
>> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >>
>> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> {
>> >> >> >> Decimal f = 0.0M;
>> >> >> >> lat = Math.Abs(lat);
>> >> >> >> f = 68.99M;
>> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >>
>> >> >> >> return f;
>> >> >> >> }
>> >> >> >>
>> >> >> >>
>> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >>
>> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >>
>> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> > ...
>> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> > calculations
>> >> >> >> > on
>> >> >> >> > the
>> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> > beginning.
>> >> >> >> > Lets say we have latitude and longitude coordinates of all
>> >> >> >> > users
>> >> >> >> > and
>> >> >> >> > we
>> >> >> >> > have
>> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >>
>> >> >> >> I'm not an expert, but I think that it boils down to "MapReduce"
>> >> >> >> and
>> >> >> >> "Hadoop".
>> >> >> >>
>> >> >> >> I don't think that there's any top-down tutorial on those two
>> >> >> >> words,
>> >> >> >> you'll have to research yourself starting here:
>> >> >> >>
>> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >>
>> >> >> >>  * http://hadoop.apache.org/
>> >> >> >>
>> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >>
>> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >>
>> >> >> >>  Paul Prescod
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>>
>

RE: How to perform queries on Cassandra?

Posted by Dop Sun <su...@dopsun.com>.
While we are talking same user names, it’s the application design to help figuring out the difference on other attributes (actually, it’s not Cassandra related, it’s application/ domain issue):

 

 Let’s say there are two scenarios (let me know if there are more):

1.       The identity behind the user name actually are same. In this case, only one is valid. this happens when one employee re-joins the company

2.       The identity behind the same user name actually are different, in this case, both are valid. this happens when two clients to open the account, but they happen to have the same name

 

For the scenario 1, it can be resolved like this:

CF1-UserNameState: Key = user name, Super Column Name: create time (time UUID)

CF2-UserAuth: key=user name, SuperColumn: password

While you are creating the second entry for the same user name (in CF1, with new timeUUID, the password in CF2 will be overwritten, and means the previously one has been disabled). Then it’s fine.

 

Well, for scenario 2, it is a little bit complex, since it depends on the nature of your application. Let’s take an example of the Bank Account Management system: two clients have same name (both are dop? :P):

Then, it’s the bank’s responsibility to have a way to make me and another Dop different while me and the other Dop using the same user name Dop to log into the system. In practice, this is resolved by giving me and others a token generator (like what HSBC did). While logging into the system, my token generated and his token generated are different. 

 

Again, there are two column families required:

CF1-User: Key= User Name, Super Column Name: unique id derived from token generated from my generator, Column Name: User UUID: (uuid)

CF2-UserAuth: Key=user uuid, superColumnName: password

 

Then, the application derives the unique id from token inputs, and plus the user name, can get the user real internal uuid based on the CF-1. Then, with CF-2 and password, it can authenticate the user uniquely.

 

Cheers~~~

Dop

 

From: vineet daniel [mailto:vineetdaniel@gmail.com] 
Sent: Sunday, April 11, 2010 9:12 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?

 

How to handle same usernames. Otherwise seems fine to me.

On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:

Hi,

 

As far as I can see it, the Cassandra API currently supports criterias on:

Token – Key – Super Column Name (if applicable) - Column Names

 

I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this:

 

Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. 

 

If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) 

 

Not sure whether this is the best way, but I guess it will work.

 

Regards,

Dop

 

From: Lucifer Dignified [mailto:vineetdaniel@gmail.com] 
Sent: Sunday, April 11, 2010 5:33 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?

 

Hi Benjamin

I'll try to make it more clear to you. 
We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like :
username : vineetdaniel
timestamp
password : <password>
timestamp

second user :

username: <seconduser>
timestamp
password:<password>

and so on, here what i assume is that as we cannot make search on values (as confirmed by guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose is this. 

Rather than using a static values for column names use values itself and unique key as identifier. So, the above example when put in as per me would be.

vineetdaniel : vineetdaniel
timestamp

<password>:<password>
timestamp

second user
seconduser:seconduser
timestamp

password:password
timestamp

By using above methodology we can simply make search on keys itself rather than going into using different CF's. But to add further, this cannot be used for every situation. I am still exploring this, and soon will be updating the group and my blog with information pertaining to this. As cassandra is new, I think every idea or experience should be shared with the community.

I hope I example is clear this time. Should you have any queries feel free to revert.

On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:

Sorry, I don't understand your example.


On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
<vi...@gmail.com> wrote:
> Benjamin I quite agree to you, but what in case of duplicate usernames,
> suppose if I am not using unique names as in email id's . If we have
> duplicacy in usernames we cannot use it for key, so what should be the
> solution. I think keeping incremental numeric id as key and keeping the name
> and value same in the column family.
>
> Example :
> User1 has password as 123456
>
> Cassandra structure :
>
> 1 as key
>            user1 - column name
>            value - user1
>            123456 - column name
>             value - 123456
>
> I m thinking of doing it this way for my applicaton, this way i can run
> different sorts of queries too. Any feedback on this is welcome.
>
> On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> You would have a Column Family, not a column for that; let's call it
>> the Users CF.  You'd use username as the row key and have a column
>> called 'password'.  For your example query, you'd retrieve row key
>> 'usr2', column 'password'.  The general pattern is that you create CFs
>> to act as indices for each query you want to perform.  There is no
>> equivalent to a relational store to perform arbitrary queries.  You
>> must structure things to permit the queries of interest.
>>
>>
>> b
>>
>> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
>> > I have already read the API spesification. Honestly I do not understand
>> > how to use it. Because there are not an examples.
>> >
>> > For example I have a column like this:
>> >
>> > UserName    Password
>> > usr1                abc
>> > usr2                xyz
>> > usr3                opm
>> >
>> > suppose I want query the user's password using SQL in RDBMS
>> >
>> >       Select Password From Users Where UserName = "usr2";
>> >
>> > Now I want to get the password using OODBMS DB4o Object Query  and Java
>> >
>> >      ObjectSet QueryResult = db.query(new Predicate()
>> >      {
>> >             public boolean match(Users Myusers)
>> >             {
>> >                  return Myuser.getUserName() == "usr2";
>> >             }
>> >      });
>> >
>> > After we get the Users instance in the QueryResult, hence we can get the
>> > usr2's password.
>> >
>> > How we perform this query using Cassandra API and Java??
>> > Would you tell me please??  Thank You.
>> >
>> > Dir.
>> >
>> >
>> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net> wrote:
>> >>
>> >> No. Cassandra has an API.
>> >>
>> >> http://wiki.apache.org/cassandra/API
>> >>
>> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
>> >> > Does Cassandra has a default query language such as SQL in RDBMS
>> >> > and Object Query in OODBMS?  Thank you.
>> >> >
>> >> > Dir.
>> >> >
>> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> > <ma...@treehousesystems.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> It's sort of an interesting problem - in RDBMS one relatively simple
>> >> >> approach would be calculate a rectangle that is X km by Y km with
>> >> >> User
>> >> >> 1's
>> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> UserY-10KmY to
>> >> >> UserX+10KmX , UserY+10KmY
>> >> >>
>> >> >> Then you could query the database for all other users where that
>> >> >> each
>> >> >> user
>> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
>> >> >> curUserY
>> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers
>> >> >> to
>> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> search)
>> >> >>
>> >> >> With the right indexes this query actually runs pretty well.
>> >> >>
>> >> >> Translating that to Cassandra seems a bit complex at first - but you
>> >> >> could
>> >> >> try something like pre-calculating a grid with the right resolution
>> >> >> (like a
>> >> >> square of 5KM per side) and assign every user to a particular grid
>> >> >> ID.
>> >> >> That
>> >> >> way you just calculate with grid ID User1 is in then do a direct key
>> >> >> lookup
>> >> >> to get a list of the users in that same grid id.
>> >> >>
>> >> >> A second approach would be to have to column families -- one that
>> >> >> maps
>> >> >> a
>> >> >> Latitude to a list of users who are at that latitude and a second
>> >> >> that
>> >> >> maps
>> >> >> users who are at a particular longitude.  You could do the same
>> >> >> rectange
>> >> >> calculation above then do a get_slice range lookup to get a list of
>> >> >> users
>> >> >> from range of latitude and a second list from the range of
>> >> >> longitudes.
>> >> >> You would then need to do a in-memory nested loop to find the list
>> >> >> of
>> >> >> users
>> >> >> that are in both lists.  This second approach could cause some
>> >> >> trouble
>> >> >> depending on where you search and how many users you really have --
>> >> >> some
>> >> >> latitudes and longitudes have many many people in them
>> >> >>
>> >> >> So, it seems some version of a chunking / grid id thing would be the
>> >> >> better approach.   If you let people zoom in or zoom out - you could
>> >> >> just
>> >> >> have different column families for each level of zoom.
>> >> >>
>> >> >>
>> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >>
>> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> {
>> >> >> Decimal f = 0.0M;
>> >> >> lat = Math.Abs(lat);
>> >> >> f = 68.99M;
>> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >>
>> >> >> return f;
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> Math.Abs(Math.Cos((Double)
>> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >>
>> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >>
>> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> > ...
>> >> >> > I'm trying to find out how do you perform queries with
>> >> >> > calculations
>> >> >> > on
>> >> >> > the
>> >> >> > fly without inserting the data as calculated from the beginning.
>> >> >> > Lets say we have latitude and longitude coordinates of all users
>> >> >> > and
>> >> >> > we
>> >> >> > have
>> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >>
>> >> >> I'm not an expert, but I think that it boils down to "MapReduce" and
>> >> >> "Hadoop".
>> >> >>
>> >> >> I don't think that there's any top-down tutorial on those two words,
>> >> >> you'll have to research yourself starting here:
>> >> >>
>> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >>
>> >> >>  * http://hadoop.apache.org/
>> >> >>
>> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >>
>> >> >> I don't think it is all documented in any one place yet...
>> >> >>
>> >> >>  Paul Prescod
>> >> >>
>> >> >
>> >> >
>> >
>> >
>
>

 

 


Re: How to perform queries on Cassandra?

Posted by vineet daniel <vi...@gmail.com>.
How to handle same usernames. Otherwise seems fine to me.

On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:

>  Hi,
>
>
>
> As far as I can see it, the Cassandra API currently supports criterias on:
>
> Token – Key – Super Column Name (if applicable) - Column Names
>
>
>
> I guess Token is not usually used for the day to day queries, so, Key and
> Column Names are normally used for querying. For the user name and password
> case, I guess it can be done like this:
>
>
>
> Define a CF as UserAuth with type as Super, and Key is user name, while
> password can be the SuperKeyName. So, while you receive the user name and
> password from the UI (or any other methods), it can be queried via:
> multiget_slice or get_range_slices, if there are anything returned, means
> that the user name and password matches.
>
>
>
> If not using the super column name, and put the password as the column
> name, the column name usually not used for these kind of discretionary
> values (actually, I don’t see any definitive documents on how to use the
> column Names and Super Columns, flexibility is the good of Cassandra, or is
> it bad if abused? :P)
>
>
>
> Not sure whether this is the best way, but I guess it will work.
>
>
>
> Regards,
>
> Dop
>
>
>
> *From:* Lucifer Dignified [mailto:vineetdaniel@gmail.com]
> *Sent:* Sunday, April 11, 2010 5:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: How to perform queries on Cassandra?
>
>
>
> Hi Benjamin
>
> I'll try to make it more clear to you.
> We have a user table with fields 'id', 'username', and 'password'. Now if
> use the ideal way to store key/value, like :
> username : vineetdaniel
> timestamp
> password : <password>
> timestamp
>
> second user :
>
> username: <seconduser>
> timestamp
> password:<password>
>
> and so on, here what i assume is that as we cannot make search on values
> (as confirmed by guys on cassandra forums) we are not able to perform robust
> 'where' queries. Now what i propose is this.
>
> Rather than using a static values for column names use values itself and
> unique key as identifier. So, the above example when put in as per me would
> be.
>
> vineetdaniel : vineetdaniel
> timestamp
>
> <password>:<password>
> timestamp
>
> second user
> seconduser:seconduser
> timestamp
>
> password:password
> timestamp
>
> By using above methodology we can simply make search on keys itself rather
> than going into using different CF's. But to add further, this cannot be
> used for every situation. I am still exploring this, and soon will be
> updating the group and my blog with information pertaining to this. As
> cassandra is new, I think every idea or experience should be shared with the
> community.
>
> I hope I example is clear this time. Should you have any queries feel free
> to revert.
>
> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>
> Sorry, I don't understand your example.
>
>
> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> <vi...@gmail.com> wrote:
> > Benjamin I quite agree to you, but what in case of duplicate usernames,
> > suppose if I am not using unique names as in email id's . If we have
> > duplicacy in usernames we cannot use it for key, so what should be the
> > solution. I think keeping incremental numeric id as key and keeping the
> name
> > and value same in the column family.
> >
> > Example :
> > User1 has password as 123456
> >
> > Cassandra structure :
> >
> > 1 as key
> >            user1 - column name
> >            value - user1
> >            123456 - column name
> >             value - 123456
> >
> > I m thinking of doing it this way for my applicaton, this way i can run
> > different sorts of queries too. Any feedback on this is welcome.
> >
> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
> >>
> >> You would have a Column Family, not a column for that; let's call it
> >> the Users CF.  You'd use username as the row key and have a column
> >> called 'password'.  For your example query, you'd retrieve row key
> >> 'usr2', column 'password'.  The general pattern is that you create CFs
> >> to act as indices for each query you want to perform.  There is no
> >> equivalent to a relational store to perform arbitrary queries.  You
> >> must structure things to permit the queries of interest.
> >>
> >>
> >> b
> >>
> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
> >> > I have already read the API spesification. Honestly I do not
> understand
> >> > how to use it. Because there are not an examples.
> >> >
> >> > For example I have a column like this:
> >> >
> >> > UserName    Password
> >> > usr1                abc
> >> > usr2                xyz
> >> > usr3                opm
> >> >
> >> > suppose I want query the user's password using SQL in RDBMS
> >> >
> >> >       Select Password From Users Where UserName = "usr2";
> >> >
> >> > Now I want to get the password using OODBMS DB4o Object Query  and
> Java
> >> >
> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >      {
> >> >             public boolean match(Users Myusers)
> >> >             {
> >> >                  return Myuser.getUserName() == "usr2";
> >> >             }
> >> >      });
> >> >
> >> > After we get the Users instance in the QueryResult, hence we can get
> the
> >> > usr2's password.
> >> >
> >> > How we perform this query using Cassandra API and Java??
> >> > Would you tell me please??  Thank You.
> >> >
> >> > Dir.
> >> >
> >> >
> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
> wrote:
> >> >>
> >> >> No. Cassandra has an API.
> >> >>
> >> >> http://wiki.apache.org/cassandra/API
> >> >>
> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
> wrote:
> >> >> > Does Cassandra has a default query language such as SQL in RDBMS
> >> >> > and Object Query in OODBMS?  Thank you.
> >> >> >
> >> >> > Dir.
> >> >> >
> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> > <ma...@treehousesystems.com>
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
> simple
> >> >> >> approach would be calculate a rectangle that is X km by Y km with
> >> >> >> User
> >> >> >> 1's
> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
> >> >> >> UserY-10KmY to
> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >>
> >> >> >> Then you could query the database for all other users where that
> >> >> >> each
> >> >> >> user
> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
> >> >> >> curUserY
> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers
> >> >> >> to
> >> >> >> degrees of  lat and longitude  (that you can find on a google
> >> >> >> search)
> >> >> >>
> >> >> >> With the right indexes this query actually runs pretty well.
> >> >> >>
> >> >> >> Translating that to Cassandra seems a bit complex at first - but
> you
> >> >> >> could
> >> >> >> try something like pre-calculating a grid with the right
> resolution
> >> >> >> (like a
> >> >> >> square of 5KM per side) and assign every user to a particular grid
> >> >> >> ID.
> >> >> >> That
> >> >> >> way you just calculate with grid ID User1 is in then do a direct
> key
> >> >> >> lookup
> >> >> >> to get a list of the users in that same grid id.
> >> >> >>
> >> >> >> A second approach would be to have to column families -- one that
> >> >> >> maps
> >> >> >> a
> >> >> >> Latitude to a list of users who are at that latitude and a second
> >> >> >> that
> >> >> >> maps
> >> >> >> users who are at a particular longitude.  You could do the same
> >> >> >> rectange
> >> >> >> calculation above then do a get_slice range lookup to get a list
> of
> >> >> >> users
> >> >> >> from range of latitude and a second list from the range of
> >> >> >> longitudes.
> >> >> >> You would then need to do a in-memory nested loop to find the list
> >> >> >> of
> >> >> >> users
> >> >> >> that are in both lists.  This second approach could cause some
> >> >> >> trouble
> >> >> >> depending on where you search and how many users you really have
> --
> >> >> >> some
> >> >> >> latitudes and longitudes have many many people in them
> >> >> >>
> >> >> >> So, it seems some version of a chunking / grid id thing would be
> the
> >> >> >> better approach.   If you let people zoom in or zoom out - you
> could
> >> >> >> just
> >> >> >> have different column families for each level of zoom.
> >> >> >>
> >> >> >>
> >> >> >> I'm stuck on a stopped train so -- here is even more code:
> >> >> >>
> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> >> {
> >> >> >> Decimal f = 0.0M;
> >> >> >> lat = Math.Abs(lat);
> >> >> >> f = 68.99M;
> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >> >>
> >> >> >> return f;
> >> >> >> }
> >> >> >>
> >> >> >>
> >> >> >> Decimal MilesPerDegreeLatitude =
> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >> >>                         dRadius = 10.0M  // ten miles
> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >>
> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >> >>
> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> >> > ...
> >> >> >> > I'm trying to find out how do you perform queries with
> >> >> >> > calculations
> >> >> >> > on
> >> >> >> > the
> >> >> >> > fly without inserting the data as calculated from the beginning.
> >> >> >> > Lets say we have latitude and longitude coordinates of all users
> >> >> >> > and
> >> >> >> > we
> >> >> >> > have
> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
> >> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >> >>
> >> >> >> I'm not an expert, but I think that it boils down to "MapReduce"
> and
> >> >> >> "Hadoop".
> >> >> >>
> >> >> >> I don't think that there's any top-down tutorial on those two
> words,
> >> >> >> you'll have to research yourself starting here:
> >> >> >>
> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >>
> >> >> >>  * http://hadoop.apache.org/
> >> >> >>
> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >>
> >> >> >> I don't think it is all documented in any one place yet...
> >> >> >>
> >> >> >>  Paul Prescod
> >> >> >>
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>
>
>

RE: How to perform queries on Cassandra?

Posted by Dop Sun <su...@dopsun.com>.
Hi,

 

As far as I can see it, the Cassandra API currently supports criterias on:

Token – Key – Super Column Name (if applicable) - Column Names

 

I guess Token is not usually used for the day to day queries, so, Key and Column Names are normally used for querying. For the user name and password case, I guess it can be done like this:

 

Define a CF as UserAuth with type as Super, and Key is user name, while password can be the SuperKeyName. So, while you receive the user name and password from the UI (or any other methods), it can be queried via: multiget_slice or get_range_slices, if there are anything returned, means that the user name and password matches. 

 

If not using the super column name, and put the password as the column name, the column name usually not used for these kind of discretionary values (actually, I don’t see any definitive documents on how to use the column Names and Super Columns, flexibility is the good of Cassandra, or is it bad if abused? :P) 

 

Not sure whether this is the best way, but I guess it will work.

 

Regards,

Dop

 

From: Lucifer Dignified [mailto:vineetdaniel@gmail.com] 
Sent: Sunday, April 11, 2010 5:33 PM
To: user@cassandra.apache.org
Subject: Re: How to perform queries on Cassandra?

 

Hi Benjamin

I'll try to make it more clear to you. 
We have a user table with fields 'id', 'username', and 'password'. Now if use the ideal way to store key/value, like :
username : vineetdaniel
timestamp
password : <password>
timestamp

second user :

username: <seconduser>
timestamp
password:<password>

and so on, here what i assume is that as we cannot make search on values (as confirmed by guys on cassandra forums) we are not able to perform robust 'where' queries. Now what i propose is this. 

Rather than using a static values for column names use values itself and unique key as identifier. So, the above example when put in as per me would be.

vineetdaniel : vineetdaniel
timestamp

<password>:<password>
timestamp

second user
seconduser:seconduser
timestamp

password:password
timestamp

By using above methodology we can simply make search on keys itself rather than going into using different CF's. But to add further, this cannot be used for every situation. I am still exploring this, and soon will be updating the group and my blog with information pertaining to this. As cassandra is new, I think every idea or experience should be shared with the community.

I hope I example is clear this time. Should you have any queries feel free to revert.

On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:

Sorry, I don't understand your example.


On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
<vi...@gmail.com> wrote:
> Benjamin I quite agree to you, but what in case of duplicate usernames,
> suppose if I am not using unique names as in email id's . If we have
> duplicacy in usernames we cannot use it for key, so what should be the
> solution. I think keeping incremental numeric id as key and keeping the name
> and value same in the column family.
>
> Example :
> User1 has password as 123456
>
> Cassandra structure :
>
> 1 as key
>            user1 - column name
>            value - user1
>            123456 - column name
>             value - 123456
>
> I m thinking of doing it this way for my applicaton, this way i can run
> different sorts of queries too. Any feedback on this is welcome.
>
> On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> You would have a Column Family, not a column for that; let's call it
>> the Users CF.  You'd use username as the row key and have a column
>> called 'password'.  For your example query, you'd retrieve row key
>> 'usr2', column 'password'.  The general pattern is that you create CFs
>> to act as indices for each query you want to perform.  There is no
>> equivalent to a relational store to perform arbitrary queries.  You
>> must structure things to permit the queries of interest.
>>
>>
>> b
>>
>> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
>> > I have already read the API spesification. Honestly I do not understand
>> > how to use it. Because there are not an examples.
>> >
>> > For example I have a column like this:
>> >
>> > UserName    Password
>> > usr1                abc
>> > usr2                xyz
>> > usr3                opm
>> >
>> > suppose I want query the user's password using SQL in RDBMS
>> >
>> >       Select Password From Users Where UserName = "usr2";
>> >
>> > Now I want to get the password using OODBMS DB4o Object Query  and Java
>> >
>> >      ObjectSet QueryResult = db.query(new Predicate()
>> >      {
>> >             public boolean match(Users Myusers)
>> >             {
>> >                  return Myuser.getUserName() == "usr2";
>> >             }
>> >      });
>> >
>> > After we get the Users instance in the QueryResult, hence we can get the
>> > usr2's password.
>> >
>> > How we perform this query using Cassandra API and Java??
>> > Would you tell me please??  Thank You.
>> >
>> > Dir.
>> >
>> >
>> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net> wrote:
>> >>
>> >> No. Cassandra has an API.
>> >>
>> >> http://wiki.apache.org/cassandra/API
>> >>
>> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
>> >> > Does Cassandra has a default query language such as SQL in RDBMS
>> >> > and Object Query in OODBMS?  Thank you.
>> >> >
>> >> > Dir.
>> >> >
>> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> > <ma...@treehousesystems.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> It's sort of an interesting problem - in RDBMS one relatively simple
>> >> >> approach would be calculate a rectangle that is X km by Y km with
>> >> >> User
>> >> >> 1's
>> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> UserY-10KmY to
>> >> >> UserX+10KmX , UserY+10KmY
>> >> >>
>> >> >> Then you could query the database for all other users where that
>> >> >> each
>> >> >> user
>> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
>> >> >> curUserY
>> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers
>> >> >> to
>> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> search)
>> >> >>
>> >> >> With the right indexes this query actually runs pretty well.
>> >> >>
>> >> >> Translating that to Cassandra seems a bit complex at first - but you
>> >> >> could
>> >> >> try something like pre-calculating a grid with the right resolution
>> >> >> (like a
>> >> >> square of 5KM per side) and assign every user to a particular grid
>> >> >> ID.
>> >> >> That
>> >> >> way you just calculate with grid ID User1 is in then do a direct key
>> >> >> lookup
>> >> >> to get a list of the users in that same grid id.
>> >> >>
>> >> >> A second approach would be to have to column families -- one that
>> >> >> maps
>> >> >> a
>> >> >> Latitude to a list of users who are at that latitude and a second
>> >> >> that
>> >> >> maps
>> >> >> users who are at a particular longitude.  You could do the same
>> >> >> rectange
>> >> >> calculation above then do a get_slice range lookup to get a list of
>> >> >> users
>> >> >> from range of latitude and a second list from the range of
>> >> >> longitudes.
>> >> >> You would then need to do a in-memory nested loop to find the list
>> >> >> of
>> >> >> users
>> >> >> that are in both lists.  This second approach could cause some
>> >> >> trouble
>> >> >> depending on where you search and how many users you really have --
>> >> >> some
>> >> >> latitudes and longitudes have many many people in them
>> >> >>
>> >> >> So, it seems some version of a chunking / grid id thing would be the
>> >> >> better approach.   If you let people zoom in or zoom out - you could
>> >> >> just
>> >> >> have different column families for each level of zoom.
>> >> >>
>> >> >>
>> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >>
>> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> {
>> >> >> Decimal f = 0.0M;
>> >> >> lat = Math.Abs(lat);
>> >> >> f = 68.99M;
>> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >>
>> >> >> return f;
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> Math.Abs(Math.Cos((Double)
>> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >>
>> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >>
>> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> > ...
>> >> >> > I'm trying to find out how do you perform queries with
>> >> >> > calculations
>> >> >> > on
>> >> >> > the
>> >> >> > fly without inserting the data as calculated from the beginning.
>> >> >> > Lets say we have latitude and longitude coordinates of all users
>> >> >> > and
>> >> >> > we
>> >> >> > have
>> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >>
>> >> >> I'm not an expert, but I think that it boils down to "MapReduce" and
>> >> >> "Hadoop".
>> >> >>
>> >> >> I don't think that there's any top-down tutorial on those two words,
>> >> >> you'll have to research yourself starting here:
>> >> >>
>> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >>
>> >> >>  * http://hadoop.apache.org/
>> >> >>
>> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >>
>> >> >> I don't think it is all documented in any one place yet...
>> >> >>
>> >> >>  Paul Prescod
>> >> >>
>> >> >
>> >> >
>> >
>> >
>
>

 


Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
Ah, I see what you are doing.  No, that won't work.  The mistake you
are making is in thinking a CF is a table and that Cassandra is
storing columns.  A CF is a namespace and Cassandra stores key/value
pairs, where the keys are the row keys and the values are maps:

'usr1': {'password':'foo', 'thing':'blah'}
'usr2': {'password':'bar', 'info':3423452}

and so on.

The only query you can actually perform is a get on a row key.  The
column sorting and slicing are just conveniences to avoid retrieving
entire rows every time you want a subset of data from them.  There is
not an interface for querying columns _without_ a row key, which is
what you are proposing.  Even if there were, there is a serious
problem in your data model as you can't tell which part is which; were
my password 'vineet' the system you describe would be unable to
distinguish between our entries.

I am ignoring here the existence of the order-preserving partitioner
because I don't think it is generally useful and any attempt to apply
it here would be wildly inefficient compared to simply maintaining
proper indices.


b

On Sun, Apr 11, 2010 at 2:33 AM, Lucifer Dignified
<vi...@gmail.com> wrote:
> Hi Benjamin
>
> I'll try to make it more clear to you.
> We have a user table with fields 'id', 'username', and 'password'. Now if
> use the ideal way to store key/value, like :
> username : vineetdaniel
> timestamp
> password : <password>
> timestamp
>
> second user :
>
> username: <seconduser>
> timestamp
> password:<password>
>
> and so on, here what i assume is that as we cannot make search on values (as
> confirmed by guys on cassandra forums) we are not able to perform robust
> 'where' queries. Now what i propose is this.
>
> Rather than using a static values for column names use values itself and
> unique key as identifier. So, the above example when put in as per me would
> be.
>
> vineetdaniel : vineetdaniel
> timestamp
>
> <password>:<password>
> timestamp
>
> second user
> seconduser:seconduser
> timestamp
>
> password:password
> timestamp
>
> By using above methodology we can simply make search on keys itself rather
> than going into using different CF's. But to add further, this cannot be
> used for every situation. I am still exploring this, and soon will be
> updating the group and my blog with information pertaining to this. As
> cassandra is new, I think every idea or experience should be shared with the
> community.
>
> I hope I example is clear this time. Should you have any queries feel free
> to revert.
>
> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> Sorry, I don't understand your example.
>>
>> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> <vi...@gmail.com> wrote:
>> > Benjamin I quite agree to you, but what in case of duplicate usernames,
>> > suppose if I am not using unique names as in email id's . If we have
>> > duplicacy in usernames we cannot use it for key, so what should be the
>> > solution. I think keeping incremental numeric id as key and keeping the
>> > name
>> > and value same in the column family.
>> >
>> > Example :
>> > User1 has password as 123456
>> >
>> > Cassandra structure :
>> >
>> > 1 as key
>> >            user1 - column name
>> >            value - user1
>> >            123456 - column name
>> >             value - 123456
>> >
>> > I m thinking of doing it this way for my applicaton, this way i can run
>> > different sorts of queries too. Any feedback on this is welcome.
>> >
>> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> You would have a Column Family, not a column for that; let's call it
>> >> the Users CF.  You'd use username as the row key and have a column
>> >> called 'password'.  For your example query, you'd retrieve row key
>> >> 'usr2', column 'password'.  The general pattern is that you create CFs
>> >> to act as indices for each query you want to perform.  There is no
>> >> equivalent to a relational store to perform arbitrary queries.  You
>> >> must structure things to permit the queries of interest.
>> >>
>> >>
>> >> b
>> >>
>> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
>> >> > I have already read the API spesification. Honestly I do not
>> >> > understand
>> >> > how to use it. Because there are not an examples.
>> >> >
>> >> > For example I have a column like this:
>> >> >
>> >> > UserName    Password
>> >> > usr1                abc
>> >> > usr2                xyz
>> >> > usr3                opm
>> >> >
>> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >
>> >> >       Select Password From Users Where UserName = "usr2";
>> >> >
>> >> > Now I want to get the password using OODBMS DB4o Object Query  and
>> >> > Java
>> >> >
>> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >      {
>> >> >             public boolean match(Users Myusers)
>> >> >             {
>> >> >                  return Myuser.getUserName() == "usr2";
>> >> >             }
>> >> >      });
>> >> >
>> >> > After we get the Users instance in the QueryResult, hence we can get
>> >> > the
>> >> > usr2's password.
>> >> >
>> >> > How we perform this query using Cassandra API and Java??
>> >> > Would you tell me please??  Thank You.
>> >> >
>> >> > Dir.
>> >> >
>> >> >
>> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
>> >> > wrote:
>> >> >>
>> >> >> No. Cassandra has an API.
>> >> >>
>> >> >> http://wiki.apache.org/cassandra/API
>> >> >>
>> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
>> >> >> wrote:
>> >> >> > Does Cassandra has a default query language such as SQL in RDBMS
>> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >
>> >> >> > Dir.
>> >> >> >
>> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> > <ma...@treehousesystems.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
>> >> >> >> simple
>> >> >> >> approach would be calculate a rectangle that is X km by Y km with
>> >> >> >> User
>> >> >> >> 1's
>> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> >> UserY-10KmY to
>> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >>
>> >> >> >> Then you could query the database for all other users where that
>> >> >> >> each
>> >> >> >> user
>> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
>> >> >> >> and
>> >> >> >> curUserY
>> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> Kilometers
>> >> >> >> to
>> >> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> >> search)
>> >> >> >>
>> >> >> >> With the right indexes this query actually runs pretty well.
>> >> >> >>
>> >> >> >> Translating that to Cassandra seems a bit complex at first - but
>> >> >> >> you
>> >> >> >> could
>> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> resolution
>> >> >> >> (like a
>> >> >> >> square of 5KM per side) and assign every user to a particular
>> >> >> >> grid
>> >> >> >> ID.
>> >> >> >> That
>> >> >> >> way you just calculate with grid ID User1 is in then do a direct
>> >> >> >> key
>> >> >> >> lookup
>> >> >> >> to get a list of the users in that same grid id.
>> >> >> >>
>> >> >> >> A second approach would be to have to column families -- one that
>> >> >> >> maps
>> >> >> >> a
>> >> >> >> Latitude to a list of users who are at that latitude and a second
>> >> >> >> that
>> >> >> >> maps
>> >> >> >> users who are at a particular longitude.  You could do the same
>> >> >> >> rectange
>> >> >> >> calculation above then do a get_slice range lookup to get a list
>> >> >> >> of
>> >> >> >> users
>> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> longitudes.
>> >> >> >> You would then need to do a in-memory nested loop to find the
>> >> >> >> list
>> >> >> >> of
>> >> >> >> users
>> >> >> >> that are in both lists.  This second approach could cause some
>> >> >> >> trouble
>> >> >> >> depending on where you search and how many users you really have
>> >> >> >> --
>> >> >> >> some
>> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >>
>> >> >> >> So, it seems some version of a chunking / grid id thing would be
>> >> >> >> the
>> >> >> >> better approach.   If you let people zoom in or zoom out - you
>> >> >> >> could
>> >> >> >> just
>> >> >> >> have different column families for each level of zoom.
>> >> >> >>
>> >> >> >>
>> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >>
>> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> {
>> >> >> >> Decimal f = 0.0M;
>> >> >> >> lat = Math.Abs(lat);
>> >> >> >> f = 68.99M;
>> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >>
>> >> >> >> return f;
>> >> >> >> }
>> >> >> >>
>> >> >> >>
>> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >>
>> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >>
>> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> >> > ...
>> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> > calculations
>> >> >> >> > on
>> >> >> >> > the
>> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> > beginning.
>> >> >> >> > Lets say we have latitude and longitude coordinates of all
>> >> >> >> > users
>> >> >> >> > and
>> >> >> >> > we
>> >> >> >> > have
>> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >>
>> >> >> >> I'm not an expert, but I think that it boils down to "MapReduce"
>> >> >> >> and
>> >> >> >> "Hadoop".
>> >> >> >>
>> >> >> >> I don't think that there's any top-down tutorial on those two
>> >> >> >> words,
>> >> >> >> you'll have to research yourself starting here:
>> >> >> >>
>> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >>
>> >> >> >>  * http://hadoop.apache.org/
>> >> >> >>
>> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >>
>> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >>
>> >> >> >>  Paul Prescod
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by Lucifer Dignified <vi...@gmail.com>.
Hi Benjamin

I'll try to make it more clear to you.
We have a user table with fields 'id', 'username', and 'password'. Now if
use the ideal way to store key/value, like :
username : vineetdaniel
timestamp
password : <password>
timestamp

second user :

username: <seconduser>
timestamp
password:<password>

and so on, here what i assume is that as we cannot make search on values (as
confirmed by guys on cassandra forums) we are not able to perform robust
'where' queries. Now what i propose is this.

Rather than using a static values for column names use values itself and
unique key as identifier. So, the above example when put in as per me would
be.

vineetdaniel : vineetdaniel
timestamp

<password>:<password>
timestamp

second user
seconduser:seconduser
timestamp

password:password
timestamp

By using above methodology we can simply make search on keys itself rather
than going into using different CF's. But to add further, this cannot be
used for every situation. I am still exploring this, and soon will be
updating the group and my blog with information pertaining to this. As
cassandra is new, I think every idea or experience should be shared with the
community.

I hope I example is clear this time. Should you have any queries feel free
to revert.

On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:

> Sorry, I don't understand your example.
>
> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
> <vi...@gmail.com> wrote:
> > Benjamin I quite agree to you, but what in case of duplicate usernames,
> > suppose if I am not using unique names as in email id's . If we have
> > duplicacy in usernames we cannot use it for key, so what should be the
> > solution. I think keeping incremental numeric id as key and keeping the
> name
> > and value same in the column family.
> >
> > Example :
> > User1 has password as 123456
> >
> > Cassandra structure :
> >
> > 1 as key
> >            user1 - column name
> >            value - user1
> >            123456 - column name
> >             value - 123456
> >
> > I m thinking of doing it this way for my applicaton, this way i can run
> > different sorts of queries too. Any feedback on this is welcome.
> >
> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
> >>
> >> You would have a Column Family, not a column for that; let's call it
> >> the Users CF.  You'd use username as the row key and have a column
> >> called 'password'.  For your example query, you'd retrieve row key
> >> 'usr2', column 'password'.  The general pattern is that you create CFs
> >> to act as indices for each query you want to perform.  There is no
> >> equivalent to a relational store to perform arbitrary queries.  You
> >> must structure things to permit the queries of interest.
> >>
> >>
> >> b
> >>
> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
> >> > I have already read the API spesification. Honestly I do not
> understand
> >> > how to use it. Because there are not an examples.
> >> >
> >> > For example I have a column like this:
> >> >
> >> > UserName    Password
> >> > usr1                abc
> >> > usr2                xyz
> >> > usr3                opm
> >> >
> >> > suppose I want query the user's password using SQL in RDBMS
> >> >
> >> >       Select Password From Users Where UserName = "usr2";
> >> >
> >> > Now I want to get the password using OODBMS DB4o Object Query  and
> Java
> >> >
> >> >      ObjectSet QueryResult = db.query(new Predicate()
> >> >      {
> >> >             public boolean match(Users Myusers)
> >> >             {
> >> >                  return Myuser.getUserName() == "usr2";
> >> >             }
> >> >      });
> >> >
> >> > After we get the Users instance in the QueryResult, hence we can get
> the
> >> > usr2's password.
> >> >
> >> > How we perform this query using Cassandra API and Java??
> >> > Would you tell me please??  Thank You.
> >> >
> >> > Dir.
> >> >
> >> >
> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net>
> wrote:
> >> >>
> >> >> No. Cassandra has an API.
> >> >>
> >> >> http://wiki.apache.org/cassandra/API
> >> >>
> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com>
> wrote:
> >> >> > Does Cassandra has a default query language such as SQL in RDBMS
> >> >> > and Object Query in OODBMS?  Thank you.
> >> >> >
> >> >> > Dir.
> >> >> >
> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> >> > <ma...@treehousesystems.com>
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
> simple
> >> >> >> approach would be calculate a rectangle that is X km by Y km with
> >> >> >> User
> >> >> >> 1's
> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
> >> >> >> UserY-10KmY to
> >> >> >> UserX+10KmX , UserY+10KmY
> >> >> >>
> >> >> >> Then you could query the database for all other users where that
> >> >> >> each
> >> >> >> user
> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
> >> >> >> curUserY
> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers
> >> >> >> to
> >> >> >> degrees of  lat and longitude  (that you can find on a google
> >> >> >> search)
> >> >> >>
> >> >> >> With the right indexes this query actually runs pretty well.
> >> >> >>
> >> >> >> Translating that to Cassandra seems a bit complex at first - but
> you
> >> >> >> could
> >> >> >> try something like pre-calculating a grid with the right
> resolution
> >> >> >> (like a
> >> >> >> square of 5KM per side) and assign every user to a particular grid
> >> >> >> ID.
> >> >> >> That
> >> >> >> way you just calculate with grid ID User1 is in then do a direct
> key
> >> >> >> lookup
> >> >> >> to get a list of the users in that same grid id.
> >> >> >>
> >> >> >> A second approach would be to have to column families -- one that
> >> >> >> maps
> >> >> >> a
> >> >> >> Latitude to a list of users who are at that latitude and a second
> >> >> >> that
> >> >> >> maps
> >> >> >> users who are at a particular longitude.  You could do the same
> >> >> >> rectange
> >> >> >> calculation above then do a get_slice range lookup to get a list
> of
> >> >> >> users
> >> >> >> from range of latitude and a second list from the range of
> >> >> >> longitudes.
> >> >> >> You would then need to do a in-memory nested loop to find the list
> >> >> >> of
> >> >> >> users
> >> >> >> that are in both lists.  This second approach could cause some
> >> >> >> trouble
> >> >> >> depending on where you search and how many users you really have
> --
> >> >> >> some
> >> >> >> latitudes and longitudes have many many people in them
> >> >> >>
> >> >> >> So, it seems some version of a chunking / grid id thing would be
> the
> >> >> >> better approach.   If you let people zoom in or zoom out - you
> could
> >> >> >> just
> >> >> >> have different column families for each level of zoom.
> >> >> >>
> >> >> >>
> >> >> >> I'm stuck on a stopped train so -- here is even more code:
> >> >> >>
> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> >> {
> >> >> >> Decimal f = 0.0M;
> >> >> >> lat = Math.Abs(lat);
> >> >> >> f = 68.99M;
> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >> >>
> >> >> >> return f;
> >> >> >> }
> >> >> >>
> >> >> >>
> >> >> >> Decimal MilesPerDegreeLatitude =
> >> >> >> GetLatitudeMiles(zList[0].Latitude);
> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> >> >> >> Math.Abs(Math.Cos((Double)
> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >> >>                         dRadius = 10.0M  // ten miles
> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >> >>
> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >> >>
> >> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> >> > ...
> >> >> >> > I'm trying to find out how do you perform queries with
> >> >> >> > calculations
> >> >> >> > on
> >> >> >> > the
> >> >> >> > fly without inserting the data as calculated from the beginning.
> >> >> >> > Lets say we have latitude and longitude coordinates of all users
> >> >> >> > and
> >> >> >> > we
> >> >> >> > have
> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
> >> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >> >>
> >> >> >> I'm not an expert, but I think that it boils down to "MapReduce"
> and
> >> >> >> "Hadoop".
> >> >> >>
> >> >> >> I don't think that there's any top-down tutorial on those two
> words,
> >> >> >> you'll have to research yourself starting here:
> >> >> >>
> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >> >>
> >> >> >>  * http://hadoop.apache.org/
> >> >> >>
> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >> >>
> >> >> >> I don't think it is all documented in any one place yet...
> >> >> >>
> >> >> >>  Paul Prescod
> >> >> >>
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
Sorry, I don't understand your example.

On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
<vi...@gmail.com> wrote:
> Benjamin I quite agree to you, but what in case of duplicate usernames,
> suppose if I am not using unique names as in email id's . If we have
> duplicacy in usernames we cannot use it for key, so what should be the
> solution. I think keeping incremental numeric id as key and keeping the name
> and value same in the column family.
>
> Example :
> User1 has password as 123456
>
> Cassandra structure :
>
> 1 as key
>            user1 - column name
>            value - user1
>            123456 - column name
>             value - 123456
>
> I m thinking of doing it this way for my applicaton, this way i can run
> different sorts of queries too. Any feedback on this is welcome.
>
> On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> You would have a Column Family, not a column for that; let's call it
>> the Users CF.  You'd use username as the row key and have a column
>> called 'password'.  For your example query, you'd retrieve row key
>> 'usr2', column 'password'.  The general pattern is that you create CFs
>> to act as indices for each query you want to perform.  There is no
>> equivalent to a relational store to perform arbitrary queries.  You
>> must structure things to permit the queries of interest.
>>
>>
>> b
>>
>> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
>> > I have already read the API spesification. Honestly I do not understand
>> > how to use it. Because there are not an examples.
>> >
>> > For example I have a column like this:
>> >
>> > UserName    Password
>> > usr1                abc
>> > usr2                xyz
>> > usr3                opm
>> >
>> > suppose I want query the user's password using SQL in RDBMS
>> >
>> >       Select Password From Users Where UserName = "usr2";
>> >
>> > Now I want to get the password using OODBMS DB4o Object Query  and Java
>> >
>> >      ObjectSet QueryResult = db.query(new Predicate()
>> >      {
>> >             public boolean match(Users Myusers)
>> >             {
>> >                  return Myuser.getUserName() == "usr2";
>> >             }
>> >      });
>> >
>> > After we get the Users instance in the QueryResult, hence we can get the
>> > usr2's password.
>> >
>> > How we perform this query using Cassandra API and Java??
>> > Would you tell me please??  Thank You.
>> >
>> > Dir.
>> >
>> >
>> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net> wrote:
>> >>
>> >> No. Cassandra has an API.
>> >>
>> >> http://wiki.apache.org/cassandra/API
>> >>
>> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
>> >> > Does Cassandra has a default query language such as SQL in RDBMS
>> >> > and Object Query in OODBMS?  Thank you.
>> >> >
>> >> > Dir.
>> >> >
>> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> > <ma...@treehousesystems.com>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> It's sort of an interesting problem - in RDBMS one relatively simple
>> >> >> approach would be calculate a rectangle that is X km by Y km with
>> >> >> User
>> >> >> 1's
>> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> UserY-10KmY to
>> >> >> UserX+10KmX , UserY+10KmY
>> >> >>
>> >> >> Then you could query the database for all other users where that
>> >> >> each
>> >> >> user
>> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
>> >> >> curUserY
>> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers
>> >> >> to
>> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> search)
>> >> >>
>> >> >> With the right indexes this query actually runs pretty well.
>> >> >>
>> >> >> Translating that to Cassandra seems a bit complex at first - but you
>> >> >> could
>> >> >> try something like pre-calculating a grid with the right resolution
>> >> >> (like a
>> >> >> square of 5KM per side) and assign every user to a particular grid
>> >> >> ID.
>> >> >> That
>> >> >> way you just calculate with grid ID User1 is in then do a direct key
>> >> >> lookup
>> >> >> to get a list of the users in that same grid id.
>> >> >>
>> >> >> A second approach would be to have to column families -- one that
>> >> >> maps
>> >> >> a
>> >> >> Latitude to a list of users who are at that latitude and a second
>> >> >> that
>> >> >> maps
>> >> >> users who are at a particular longitude.  You could do the same
>> >> >> rectange
>> >> >> calculation above then do a get_slice range lookup to get a list of
>> >> >> users
>> >> >> from range of latitude and a second list from the range of
>> >> >> longitudes.
>> >> >> You would then need to do a in-memory nested loop to find the list
>> >> >> of
>> >> >> users
>> >> >> that are in both lists.  This second approach could cause some
>> >> >> trouble
>> >> >> depending on where you search and how many users you really have --
>> >> >> some
>> >> >> latitudes and longitudes have many many people in them
>> >> >>
>> >> >> So, it seems some version of a chunking / grid id thing would be the
>> >> >> better approach.   If you let people zoom in or zoom out - you could
>> >> >> just
>> >> >> have different column families for each level of zoom.
>> >> >>
>> >> >>
>> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >>
>> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> {
>> >> >> Decimal f = 0.0M;
>> >> >> lat = Math.Abs(lat);
>> >> >> f = 68.99M;
>> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >>
>> >> >> return f;
>> >> >> }
>> >> >>
>> >> >>
>> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> Math.Abs(Math.Cos((Double)
>> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >>
>> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >>
>> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> >> > ...
>> >> >> > I'm trying to find out how do you perform queries with
>> >> >> > calculations
>> >> >> > on
>> >> >> > the
>> >> >> > fly without inserting the data as calculated from the beginning.
>> >> >> > Lets say we have latitude and longitude coordinates of all users
>> >> >> > and
>> >> >> > we
>> >> >> > have
>> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >>
>> >> >> I'm not an expert, but I think that it boils down to "MapReduce" and
>> >> >> "Hadoop".
>> >> >>
>> >> >> I don't think that there's any top-down tutorial on those two words,
>> >> >> you'll have to research yourself starting here:
>> >> >>
>> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >>
>> >> >>  * http://hadoop.apache.org/
>> >> >>
>> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >>
>> >> >> I don't think it is all documented in any one place yet...
>> >> >>
>> >> >>  Paul Prescod
>> >> >>
>> >> >
>> >> >
>> >
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by Lucifer Dignified <vi...@gmail.com>.
Benjamin I quite agree to you, but what in case of duplicate usernames,
suppose if I am not using unique names as in email id's . If we have
duplicacy in usernames we cannot use it for key, so what should be the
solution. I think keeping incremental numeric id as key and keeping the name
and value same in the column family.

Example :
User1 has password as 123456

Cassandra structure :

1 as key
           user1 - column name
           value - user1
           123456 - column name
            value - 123456

I m thinking of doing it this way for my applicaton, this way i can run
different sorts of queries too. Any feedback on this is welcome.

On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:

> You would have a Column Family, not a column for that; let's call it
> the Users CF.  You'd use username as the row key and have a column
> called 'password'.  For your example query, you'd retrieve row key
> 'usr2', column 'password'.  The general pattern is that you create CFs
> to act as indices for each query you want to perform.  There is no
> equivalent to a relational store to perform arbitrary queries.  You
> must structure things to permit the queries of interest.
>
>
> b
>
> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
> > I have already read the API spesification. Honestly I do not understand
> > how to use it. Because there are not an examples.
> >
> > For example I have a column like this:
> >
> > UserName    Password
> > usr1                abc
> > usr2                xyz
> > usr3                opm
> >
> > suppose I want query the user's password using SQL in RDBMS
> >
> >       Select Password From Users Where UserName = "usr2";
> >
> > Now I want to get the password using OODBMS DB4o Object Query  and Java
> >
> >      ObjectSet QueryResult = db.query(new Predicate()
> >      {
> >             public boolean match(Users Myusers)
> >             {
> >                  return Myuser.getUserName() == "usr2";
> >             }
> >      });
> >
> > After we get the Users instance in the QueryResult, hence we can get the
> > usr2's password.
> >
> > How we perform this query using Cassandra API and Java??
> > Would you tell me please??  Thank You.
> >
> > Dir.
> >
> >
> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net> wrote:
> >>
> >> No. Cassandra has an API.
> >>
> >> http://wiki.apache.org/cassandra/API
> >>
> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
> >> > Does Cassandra has a default query language such as SQL in RDBMS
> >> > and Object Query in OODBMS?  Thank you.
> >> >
> >> > Dir.
> >> >
> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
> >> > <ma...@treehousesystems.com>
> >> > wrote:
> >> >>
> >> >>
> >> >> It's sort of an interesting problem - in RDBMS one relatively simple
> >> >> approach would be calculate a rectangle that is X km by Y km with
> User
> >> >> 1's
> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
> >> >> UserY-10KmY to
> >> >> UserX+10KmX , UserY+10KmY
> >> >>
> >> >> Then you could query the database for all other users where that each
> >> >> user
> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
> >> >> curUserY
> >> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> >> * Not the 10KmX and 10KmY are really a translation from Kilometers to
> >> >> degrees of  lat and longitude  (that you can find on a google search)
> >> >>
> >> >> With the right indexes this query actually runs pretty well.
> >> >>
> >> >> Translating that to Cassandra seems a bit complex at first - but you
> >> >> could
> >> >> try something like pre-calculating a grid with the right resolution
> >> >> (like a
> >> >> square of 5KM per side) and assign every user to a particular grid
> ID.
> >> >> That
> >> >> way you just calculate with grid ID User1 is in then do a direct key
> >> >> lookup
> >> >> to get a list of the users in that same grid id.
> >> >>
> >> >> A second approach would be to have to column families -- one that
> maps
> >> >> a
> >> >> Latitude to a list of users who are at that latitude and a second
> that
> >> >> maps
> >> >> users who are at a particular longitude.  You could do the same
> >> >> rectange
> >> >> calculation above then do a get_slice range lookup to get a list of
> >> >> users
> >> >> from range of latitude and a second list from the range of
> longitudes.
> >> >> You would then need to do a in-memory nested loop to find the list of
> >> >> users
> >> >> that are in both lists.  This second approach could cause some
> trouble
> >> >> depending on where you search and how many users you really have --
> >> >> some
> >> >> latitudes and longitudes have many many people in them
> >> >>
> >> >> So, it seems some version of a chunking / grid id thing would be the
> >> >> better approach.   If you let people zoom in or zoom out - you could
> >> >> just
> >> >> have different column families for each level of zoom.
> >> >>
> >> >>
> >> >> I'm stuck on a stopped train so -- here is even more code:
> >> >>
> >> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> >> {
> >> >> Decimal f = 0.0M;
> >> >> lat = Math.Abs(lat);
> >> >> f = 68.99M;
> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> >> else if (lat >= 80.0M) { f = 69.38M; }
> >> >>
> >> >> return f;
> >> >> }
> >> >>
> >> >>
> >> >> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
> Math.Abs(Math.Cos((Double)
> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >> >>                         dRadius = 10.0M  // ten miles
> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >> >>
> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >> >>
> >> >>
> >> >>
> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >> >>
> >> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> >> > ...
> >> >> > I'm trying to find out how do you perform queries with calculations
> >> >> > on
> >> >> > the
> >> >> > fly without inserting the data as calculated from the beginning.
> >> >> > Lets say we have latitude and longitude coordinates of all users
> and
> >> >> > we
> >> >> > have
> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
> >> >> > gives distance between lat/longs pairs in kilometers.
> >> >>
> >> >> I'm not an expert, but I think that it boils down to "MapReduce" and
> >> >> "Hadoop".
> >> >>
> >> >> I don't think that there's any top-down tutorial on those two words,
> >> >> you'll have to research yourself starting here:
> >> >>
> >> >>  * http://en.wikipedia.org/wiki/MapReduce
> >> >>
> >> >>  * http://hadoop.apache.org/
> >> >>
> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >> >>
> >> >> I don't think it is all documented in any one place yet...
> >> >>
> >> >>  Paul Prescod
> >> >>
> >> >
> >> >
> >
> >
>

Re: How to perform queries on Cassandra?

Posted by Benjamin Black <b...@b3k.us>.
You would have a Column Family, not a column for that; let's call it
the Users CF.  You'd use username as the row key and have a column
called 'password'.  For your example query, you'd retrieve row key
'usr2', column 'password'.  The general pattern is that you create CFs
to act as indices for each query you want to perform.  There is no
equivalent to a relational store to perform arbitrary queries.  You
must structure things to permit the queries of interest.


b

On Sat, Apr 10, 2010 at 8:34 PM, dir dir <si...@gmail.com> wrote:
> I have already read the API spesification. Honestly I do not understand
> how to use it. Because there are not an examples.
>
> For example I have a column like this:
>
> UserName    Password
> usr1                abc
> usr2                xyz
> usr3                opm
>
> suppose I want query the user's password using SQL in RDBMS
>
>       Select Password From Users Where UserName = "usr2";
>
> Now I want to get the password using OODBMS DB4o Object Query  and Java
>
>      ObjectSet QueryResult = db.query(new Predicate()
>      {
>             public boolean match(Users Myusers)
>             {
>                  return Myuser.getUserName() == "usr2";
>             }
>      });
>
> After we get the Users instance in the QueryResult, hence we can get the
> usr2's password.
>
> How we perform this query using Cassandra API and Java??
> Would you tell me please??  Thank You.
>
> Dir.
>
>
> On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net> wrote:
>>
>> No. Cassandra has an API.
>>
>> http://wiki.apache.org/cassandra/API
>>
>> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
>> > Does Cassandra has a default query language such as SQL in RDBMS
>> > and Object Query in OODBMS?  Thank you.
>> >
>> > Dir.
>> >
>> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> > <ma...@treehousesystems.com>
>> > wrote:
>> >>
>> >>
>> >> It's sort of an interesting problem - in RDBMS one relatively simple
>> >> approach would be calculate a rectangle that is X km by Y km with User
>> >> 1's
>> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> UserY-10KmY to
>> >> UserX+10KmX , UserY+10KmY
>> >>
>> >> Then you could query the database for all other users where that each
>> >> user
>> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
>> >> curUserY
>> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> * Not the 10KmX and 10KmY are really a translation from Kilometers to
>> >> degrees of  lat and longitude  (that you can find on a google search)
>> >>
>> >> With the right indexes this query actually runs pretty well.
>> >>
>> >> Translating that to Cassandra seems a bit complex at first - but you
>> >> could
>> >> try something like pre-calculating a grid with the right resolution
>> >> (like a
>> >> square of 5KM per side) and assign every user to a particular grid ID.
>> >> That
>> >> way you just calculate with grid ID User1 is in then do a direct key
>> >> lookup
>> >> to get a list of the users in that same grid id.
>> >>
>> >> A second approach would be to have to column families -- one that maps
>> >> a
>> >> Latitude to a list of users who are at that latitude and a second that
>> >> maps
>> >> users who are at a particular longitude.  You could do the same
>> >> rectange
>> >> calculation above then do a get_slice range lookup to get a list of
>> >> users
>> >> from range of latitude and a second list from the range of longitudes.
>> >> You would then need to do a in-memory nested loop to find the list of
>> >> users
>> >> that are in both lists.  This second approach could cause some trouble
>> >> depending on where you search and how many users you really have --
>> >> some
>> >> latitudes and longitudes have many many people in them
>> >>
>> >> So, it seems some version of a chunking / grid id thing would be the
>> >> better approach.   If you let people zoom in or zoom out - you could
>> >> just
>> >> have different column families for each level of zoom.
>> >>
>> >>
>> >> I'm stuck on a stopped train so -- here is even more code:
>> >>
>> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> {
>> >> Decimal f = 0.0M;
>> >> lat = Math.Abs(lat);
>> >> f = 68.99M;
>> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >>
>> >> return f;
>> >> }
>> >>
>> >>
>> >> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
>> >> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)
>> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >>                         dRadius = 10.0M  // ten miles
>> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >>
>> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >>
>> >>
>> >>
>> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >>
>> >> 2010/4/9 Onur AKTAS <on...@live.com>:
>> >> > ...
>> >> > I'm trying to find out how do you perform queries with calculations
>> >> > on
>> >> > the
>> >> > fly without inserting the data as calculated from the beginning.
>> >> > Lets say we have latitude and longitude coordinates of all users and
>> >> > we
>> >> > have
>> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> >> > gives distance between lat/longs pairs in kilometers.
>> >>
>> >> I'm not an expert, but I think that it boils down to "MapReduce" and
>> >> "Hadoop".
>> >>
>> >> I don't think that there's any top-down tutorial on those two words,
>> >> you'll have to research yourself starting here:
>> >>
>> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >>
>> >>  * http://hadoop.apache.org/
>> >>
>> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >>
>> >> I don't think it is all documented in any one place yet...
>> >>
>> >>  Paul Prescod
>> >>
>> >
>> >
>
>

Re: How to perform queries on Cassandra?

Posted by dir dir <si...@gmail.com>.
I have already read the API spesification. Honestly I do not understand
how to use it. Because there are not an examples.

For example I have a column like this:

UserName    Password
usr1                abc
usr2                xyz
usr3                opm

suppose I want query the user's password using SQL in RDBMS

      Select Password From Users Where UserName = "usr2";

Now I want to get the password using OODBMS DB4o Object Query  and Java

     ObjectSet QueryResult = db.query(new Predicate()
     {
            public boolean match(Users Myusers)
            {
                 return Myuser.getUserName() == "usr2";
            }
     });

After we get the Users instance in the QueryResult, hence we can get the
usr2's password.

How we perform this query using Cassandra API and Java??
Would you tell me please??  Thank You.

Dir.


On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <pa...@prescod.net> wrote:

> No. Cassandra has an API.
>
> http://wiki.apache.org/cassandra/API
>
> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
> > Does Cassandra has a default query language such as SQL in RDBMS
> > and Object Query in OODBMS?  Thank you.
> >
> > Dir.
> >
> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith <malsmith@treehousesystems.com
> >
> > wrote:
> >>
> >>
> >> It's sort of an interesting problem - in RDBMS one relatively simple
> >> approach would be calculate a rectangle that is X km by Y km with User
> 1's
> >> location at the center.  So the rectangle is UserX - 10KmX , UserY-10KmY
> to
> >> UserX+10KmX , UserY+10KmY
> >>
> >> Then you could query the database for all other users where that each
> user
> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
> curUserY
> >> > UserY-10KmY and curUserY < UserY+10KmY
> >> * Not the 10KmX and 10KmY are really a translation from Kilometers to
> >> degrees of  lat and longitude  (that you can find on a google search)
> >>
> >> With the right indexes this query actually runs pretty well.
> >>
> >> Translating that to Cassandra seems a bit complex at first - but you
> could
> >> try something like pre-calculating a grid with the right resolution
> (like a
> >> square of 5KM per side) and assign every user to a particular grid ID.
> That
> >> way you just calculate with grid ID User1 is in then do a direct key
> lookup
> >> to get a list of the users in that same grid id.
> >>
> >> A second approach would be to have to column families -- one that maps a
> >> Latitude to a list of users who are at that latitude and a second that
> maps
> >> users who are at a particular longitude.  You could do the same rectange
> >> calculation above then do a get_slice range lookup to get a list of
> users
> >> from range of latitude and a second list from the range of longitudes.
> >> You would then need to do a in-memory nested loop to find the list of
> users
> >> that are in both lists.  This second approach could cause some trouble
> >> depending on where you search and how many users you really have -- some
> >> latitudes and longitudes have many many people in them
> >>
> >> So, it seems some version of a chunking / grid id thing would be the
> >> better approach.   If you let people zoom in or zoom out - you could
> just
> >> have different column families for each level of zoom.
> >>
> >>
> >> I'm stuck on a stopped train so -- here is even more code:
> >>
> >> static Decimal GetLatitudeMiles(Decimal lat)
> >> {
> >> Decimal f = 0.0M;
> >> lat = Math.Abs(lat);
> >> f = 68.99M;
> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> >> else if (lat >= 80.0M) { f = 69.38M; }
> >>
> >> return f;
> >> }
> >>
> >>
> >> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
> >> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)
> >> zList[0].Latitude))) * 24900.0M / 360.0M;
> >>                         dRadius = 10.0M  // ten miles
> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
> >>
> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
> >>
> >>
> >>
> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
> >>
> >> 2010/4/9 Onur AKTAS <on...@live.com>:
> >> > ...
> >> > I'm trying to find out how do you perform queries with calculations on
> >> > the
> >> > fly without inserting the data as calculated from the beginning.
> >> > Lets say we have latitude and longitude coordinates of all users and
> we
> >> > have
> >> >  Distance(from_lat, from_long, to_lat, to_long) function which
> >> > gives distance between lat/longs pairs in kilometers.
> >>
> >> I'm not an expert, but I think that it boils down to "MapReduce" and
> >> "Hadoop".
> >>
> >> I don't think that there's any top-down tutorial on those two words,
> >> you'll have to research yourself starting here:
> >>
> >>  * http://en.wikipedia.org/wiki/MapReduce
> >>
> >>  * http://hadoop.apache.org/
> >>
> >>  * http://wiki.apache.org/cassandra/HadoopSupport
> >>
> >> I don't think it is all documented in any one place yet...
> >>
> >>  Paul Prescod
> >>
> >
> >
>

Re: How to perform queries on Cassandra?

Posted by Paul Prescod <pa...@prescod.net>.
No. Cassandra has an API.

http://wiki.apache.org/cassandra/API

On Fri, Apr 9, 2010 at 8:00 PM, dir dir <si...@gmail.com> wrote:
> Does Cassandra has a default query language such as SQL in RDBMS
> and Object Query in OODBMS?  Thank you.
>
> Dir.
>
> On Sat, Apr 10, 2010 at 7:01 AM, malsmith <ma...@treehousesystems.com>
> wrote:
>>
>>
>> It's sort of an interesting problem - in RDBMS one relatively simple
>> approach would be calculate a rectangle that is X km by Y km with User 1's
>> location at the center.  So the rectangle is UserX - 10KmX , UserY-10KmY to
>> UserX+10KmX , UserY+10KmY
>>
>> Then you could query the database for all other users where that each user
>> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and curUserY
>> > UserY-10KmY and curUserY < UserY+10KmY
>> * Not the 10KmX and 10KmY are really a translation from Kilometers to
>> degrees of  lat and longitude  (that you can find on a google search)
>>
>> With the right indexes this query actually runs pretty well.
>>
>> Translating that to Cassandra seems a bit complex at first - but you could
>> try something like pre-calculating a grid with the right resolution (like a
>> square of 5KM per side) and assign every user to a particular grid ID.  That
>> way you just calculate with grid ID User1 is in then do a direct key lookup
>> to get a list of the users in that same grid id.
>>
>> A second approach would be to have to column families -- one that maps a
>> Latitude to a list of users who are at that latitude and a second that maps
>> users who are at a particular longitude.  You could do the same rectange
>> calculation above then do a get_slice range lookup to get a list of users
>> from range of latitude and a second list from the range of longitudes.
>> You would then need to do a in-memory nested loop to find the list of users
>> that are in both lists.  This second approach could cause some trouble
>> depending on where you search and how many users you really have -- some
>> latitudes and longitudes have many many people in them
>>
>> So, it seems some version of a chunking / grid id thing would be the
>> better approach.   If you let people zoom in or zoom out - you could just
>> have different column families for each level of zoom.
>>
>>
>> I'm stuck on a stopped train so -- here is even more code:
>>
>> static Decimal GetLatitudeMiles(Decimal lat)
>> {
>> Decimal f = 0.0M;
>> lat = Math.Abs(lat);
>> f = 68.99M;
>>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> else if (lat >= 80.0M) { f = 69.38M; }
>>
>> return f;
>> }
>>
>>
>> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
>> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)
>> zList[0].Latitude))) * 24900.0M / 360.0M;
>>                         dRadius = 10.0M  // ten miles
>> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>>
>> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>>
>>
>>
>> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>>
>> 2010/4/9 Onur AKTAS <on...@live.com>:
>> > ...
>> > I'm trying to find out how do you perform queries with calculations on
>> > the
>> > fly without inserting the data as calculated from the beginning.
>> > Lets say we have latitude and longitude coordinates of all users and we
>> > have
>> >  Distance(from_lat, from_long, to_lat, to_long) function which
>> > gives distance between lat/longs pairs in kilometers.
>>
>> I'm not an expert, but I think that it boils down to "MapReduce" and
>> "Hadoop".
>>
>> I don't think that there's any top-down tutorial on those two words,
>> you'll have to research yourself starting here:
>>
>>  * http://en.wikipedia.org/wiki/MapReduce
>>
>>  * http://hadoop.apache.org/
>>
>>  * http://wiki.apache.org/cassandra/HadoopSupport
>>
>> I don't think it is all documented in any one place yet...
>>
>>  Paul Prescod
>>
>
>

Re: How to perform queries on Cassandra?

Posted by dir dir <si...@gmail.com>.
Does Cassandra has a default query language such as SQL in RDBMS
and Object Query in OODBMS?  Thank you.

Dir.

On Sat, Apr 10, 2010 at 7:01 AM, malsmith <ma...@treehousesystems.com>wrote:

>
>
> It's sort of an interesting problem - in RDBMS one relatively simple
> approach would be calculate a rectangle that is X km by Y km with User 1's
> location at the center.  So the rectangle is UserX - 10KmX , UserY-10KmY to
> UserX+10KmX , UserY+10KmY
>
> Then you could query the database for all other users where that each user
> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and curUserY
> > UserY-10KmY and curUserY < UserY+10KmY
> * Not the 10KmX and 10KmY are really a translation from Kilometers to
> degrees of  lat and longitude  (that you can find on a google search)
>
> With the right indexes this query actually runs pretty well.
>
> Translating that to Cassandra seems a bit complex at first - but you could
> try something like pre-calculating a grid with the right resolution (like a
> square of 5KM per side) and assign every user to a particular grid ID.  That
> way you just calculate with grid ID User1 is in then do a direct key lookup
> to get a list of the users in that same grid id.
>
> A second approach would be to have to column families -- one that maps a
> Latitude to a list of users who are at that latitude and a second that maps
> users who are at a particular longitude.  You could do the same rectange
> calculation above then do a get_slice range lookup to get a list of users
> from range of latitude and a second list from the range of longitudes.
> You would then need to do a in-memory nested loop to find the list of users
> that are in both lists.  This second approach could cause some trouble
> depending on where you search and how many users you really have -- some
> latitudes and longitudes have many many people in them
>
> So, it seems some version of a chunking / grid id thing would be the better
> approach.   If you let people zoom in or zoom out - you could just have
> different column families for each level of zoom.
>
>
> I'm stuck on a stopped train so -- here is even more code:
>
> static Decimal GetLatitudeMiles(Decimal lat)
> {
> Decimal f = 0.0M;
> lat = Math.Abs(lat);
> f = 68.99M;
>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
> else if (lat >= 80.0M) { f = 69.38M; }
>
> return f;
> }
>
>
> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)
> zList[0].Latitude))) * 24900.0M / 360.0M;
>                         dRadius = 10.0M  // ten miles
> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>
> ps.TopLatitude = zList[0].Latitude - deltaLat;
> ps.TopLongitude = zList[0].Longitude - deltaLong;
> ps.BottomLatitude = zList[0].Latitude + deltaLat;
> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>
>
>
>
> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>
> 2010/4/9 Onur AKTAS <on...@live.com>:
> > ...
> > I'm trying to find out how do you perform queries with calculations on the
> > fly without inserting the data as calculated from the beginning.
> > Lets say we have latitude and longitude coordinates of all users and we have
> >  Distance(from_lat, from_long, to_lat, to_long) function which
> > gives distance between lat/longs pairs in kilometers.
>
> I'm not an expert, but I think that it boils down to "MapReduce" and "Hadoop".
>
> I don't think that there's any top-down tutorial on those two words,
> you'll have to research yourself starting here:
>
>  * http://en.wikipedia.org/wiki/MapReduce
>
>  * http://hadoop.apache.org/
>
>  * http://wiki.apache.org/cassandra/HadoopSupport
>
> I don't think it is all documented in any one place yet...
>
>  Paul Prescod
>
>
>

Re: How to perform queries on Cassandra?

Posted by malsmith <ma...@treehousesystems.com>.

It's sort of an interesting problem - in RDBMS one relatively simple
approach would be calculate a rectangle that is X km by Y km with User
1's location at the center.  So the rectangle is UserX - 10KmX ,
UserY-10KmY to UserX+10KmX , UserY+10KmY

Then you could query the database for all other users where that each
user considered is curUserX > UserX-10Km and curUserX < UserX+10KmX and
curUserY > UserY-10KmY and curUserY < UserY+10KmY  
* Not the 10KmX and 10KmY are really a translation from Kilometers to
degrees of  lat and longitude  (that you can find on a google search)

With the right indexes this query actually runs pretty well.   

Translating that to Cassandra seems a bit complex at first - but you
could try something like pre-calculating a grid with the right
resolution (like a square of 5KM per side) and assign every user to a
particular grid ID.  That way you just calculate with grid ID User1 is
in then do a direct key lookup to get a list of the users in that same
grid id. 

A second approach would be to have to column families -- one that maps a
Latitude to a list of users who are at that latitude and a second that
maps users who are at a particular longitude.  You could do the same
rectange calculation above then do a get_slice range lookup to get a
list of users from range of latitude and a second list from the range of
longitudes.    You would then need to do a in-memory nested loop to find
the list of users that are in both lists.  This second approach could
cause some trouble depending on where you search and how many users you
really have -- some latitudes and longitudes have many many people in
them

So, it seems some version of a chunking / grid id thing would be the
better approach.   If you let people zoom in or zoom out - you could
just have different column families for each level of zoom.


I'm stuck on a stopped train so -- here is even more code:

static Decimal GetLatitudeMiles(Decimal lat) 
{
Decimal f = 0.0M;
lat = Math.Abs(lat);
f = 68.99M;
         if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; } 
else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
else if (lat >= 80.0M) { f = 69.38M; }

return f;
}


Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)
zList[0].Latitude))) * 24900.0M / 360.0M;
                        dRadius = 10.0M  // ten miles
Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
Decimal deltaLong = dRadius / MilesPerDegreeLongitude;

ps.TopLatitude = zList[0].Latitude - deltaLat;
ps.TopLongitude = zList[0].Longitude - deltaLong;
ps.BottomLatitude = zList[0].Latitude + deltaLat;
ps.BottomLongitude = zList[0].Longitude + deltaLong;



On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote: 

> 2010/4/9 Onur AKTAS <on...@live.com>:
> > ...
> > I'm trying to find out how do you perform queries with calculations on the
> > fly without inserting the data as calculated from the beginning.
> > Lets say we have latitude and longitude coordinates of all users and we have
> >  Distance(from_lat, from_long, to_lat, to_long) function which
> > gives distance between lat/longs pairs in kilometers.
> 
> I'm not an expert, but I think that it boils down to "MapReduce" and "Hadoop".
> 
> I don't think that there's any top-down tutorial on those two words,
> you'll have to research yourself starting here:
> 
>  * http://en.wikipedia.org/wiki/MapReduce
> 
>  * http://hadoop.apache.org/
> 
>  * http://wiki.apache.org/cassandra/HadoopSupport
> 
> I don't think it is all documented in any one place yet...
> 
>  Paul Prescod



Re: How to perform queries on Cassandra?

Posted by Paul Prescod <pa...@prescod.net>.
2010/4/9 Onur AKTAS <on...@live.com>:
> ...
> I'm trying to find out how do you perform queries with calculations on the
> fly without inserting the data as calculated from the beginning.
> Lets say we have latitude and longitude coordinates of all users and we have
>  Distance(from_lat, from_long, to_lat, to_long) function which
> gives distance between lat/longs pairs in kilometers.

I'm not an expert, but I think that it boils down to "MapReduce" and "Hadoop".

I don't think that there's any top-down tutorial on those two words,
you'll have to research yourself starting here:

 * http://en.wikipedia.org/wiki/MapReduce

 * http://hadoop.apache.org/

 * http://wiki.apache.org/cassandra/HadoopSupport

I don't think it is all documented in any one place yet...

 Paul Prescod