You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Sasha Dolgy <sd...@gmail.com> on 2011/06/21 18:12:38 UTC

solandra or pig or....?

Folks,

Simple question ... Assuming my current use case is the ability to log
lots of trivial and seemingly useless sports statistics ... I want a
user to be able to query / compare .... For example:

--> Show me all baseball players in cheektowaga and ontario,
california who have hit a grandslam on tuesdays where it was just a
leap year.

Each baseball player is represented by a single row in a CF:

player_uuid, fullname, hometown, game1, game2, game3, game4

Game's are UUID's that are a reference to another row in the same CF
that provides information about that game...

location, final score, date (unix timestamp or ISO format) , and
statitics which are represented as a new column timestamp:player_uuid

I can use PIG, as I understand, to run a query to generate specific
information about specific "things" and populate that data back into
Cassandra in another CF ... similar to the hypothetical search
above....as the information is structured already, i assume PIG is the
right tool for the job, but may not be ideal for a web application and
enabling ad-hoc queries ... it could take anywhere from 2-....?
seconds for that query to generate, populate, and return to the
user...?

On the other hand, I have started to read about Solr / Solandra /
Lucandra .... can this provide similar functionality or better ?  or
is it more geared towards full text search and indexing ...

I don't want to get into the habit of guessing what my potential users
want to search for ... trying to think of ways to offload this to
them.



-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: solandra or pig or....?

Posted by Jeremy Hanna <je...@gmail.com>.

Just wanted to mention that there is also a #solandra irc channel on freenode in case people are interested.

On Jun 21, 2011, at 1:26 PM, Mark Kerzner wrote:

> Me too!
> 
> I would be interested to know how such queries are done in Solandra. I would understand it if it creates a complete Lucene index of everything that's in Cassandra, and adds the text search. Then your query goes against Lucene.
> 
> But if some data is found in column families in Cassandra, and some - in Lucene, then how does the combined query work? Are there examples of its use?
> 
> Thank you,
> Mark
> 
> On Tue, Jun 21, 2011 at 11:19 AM, Jake Luciani <ja...@gmail.com> wrote:
> Solandra can answer the question you used as an example and it's more of a fit for low-latency ad-hoc reporting then PIG.  Pig queries will take minutes not seconds.
> 
> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy <sd...@gmail.com> wrote:
> Folks,
> 
> Simple question ... Assuming my current use case is the ability to log
> lots of trivial and seemingly useless sports statistics ... I want a
> user to be able to query / compare .... For example:
> 
> --> Show me all baseball players in cheektowaga and ontario,
> california who have hit a grandslam on tuesdays where it was just a
> leap year.
> 
> Each baseball player is represented by a single row in a CF:
> 
> player_uuid, fullname, hometown, game1, game2, game3, game4
> 
> Game's are UUID's that are a reference to another row in the same CF
> that provides information about that game...
> 
> location, final score, date (unix timestamp or ISO format) , and
> statitics which are represented as a new column timestamp:player_uuid
> 
> I can use PIG, as I understand, to run a query to generate specific
> information about specific "things" and populate that data back into
> Cassandra in another CF ... similar to the hypothetical search
> above....as the information is structured already, i assume PIG is the
> right tool for the job, but may not be ideal for a web application and
> enabling ad-hoc queries ... it could take anywhere from 2-....?
> seconds for that query to generate, populate, and return to the
> user...?
> 
> On the other hand, I have started to read about Solr / Solandra /
> Lucandra .... can this provide similar functionality or better ?  or
> is it more geared towards full text search and indexing ...
> 
> I don't want to get into the habit of guessing what my potential users
> want to search for ... trying to think of ways to offload this to
> them.
> 
> 
> 
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
> 
> 
> 
> -- 
> http://twitter.com/tjake
>

Re: solandra or pig or....?

Posted by Mark Kerzner <ma...@gmail.com>.

Me too!

I would be interested to know how such queries are done in Solandra. I would
understand it if it creates a complete Lucene index of everything that's in
Cassandra, and adds the text search. Then your query goes against Lucene.

But if some data is found in column families in Cassandra, and some - in
Lucene, then how does the combined query work? Are there examples of its
use?

Thank you,
Mark

On Tue, Jun 21, 2011 at 11:19 AM, Jake Luciani <ja...@gmail.com> wrote:

> Solandra can answer the question you used as an example and it's more of a
> fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
> minutes not seconds.
>
> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy <sd...@gmail.com> wrote:
>
>> Folks,
>>
>> Simple question ... Assuming my current use case is the ability to log
>> lots of trivial and seemingly useless sports statistics ... I want a
>> user to be able to query / compare .... For example:
>>
>> --> Show me all baseball players in cheektowaga and ontario,
>> california who have hit a grandslam on tuesdays where it was just a
>> leap year.
>>
>> Each baseball player is represented by a single row in a CF:
>>
>> player_uuid, fullname, hometown, game1, game2, game3, game4
>>
>> Game's are UUID's that are a reference to another row in the same CF
>> that provides information about that game...
>>
>> location, final score, date (unix timestamp or ISO format) , and
>> statitics which are represented as a new column timestamp:player_uuid
>>
>> I can use PIG, as I understand, to run a query to generate specific
>> information about specific "things" and populate that data back into
>> Cassandra in another CF ... similar to the hypothetical search
>> above....as the information is structured already, i assume PIG is the
>> right tool for the job, but may not be ideal for a web application and
>> enabling ad-hoc queries ... it could take anywhere from 2-....?
>> seconds for that query to generate, populate, and return to the
>> user...?
>>
>> On the other hand, I have started to read about Solr / Solandra /
>> Lucandra .... can this provide similar functionality or better ?  or
>> is it more geared towards full text search and indexing ...
>>
>> I don't want to get into the habit of guessing what my potential users
>> want to search for ... trying to think of ways to offload this to
>> them.
>>
>>
>>
>> --
>> Sasha Dolgy
>> sasha.dolgy@gmail.com
>>
>
>
>
> --
> http://twitter.com/tjake
>

Re: solandra or pig or....?

Posted by Jake Luciani <ja...@gmail.com>.

Your application isn't aware of Cassandra only Solr.

The idea of Solandra is to use Cassandra as a backend for Solr.
Solr has a distributed search mechanism already so by making Solr Cassandra
aware
it can auto-shard and manage distributed queries for you, with replication
and failover etc

As for examples the code comes with a demo app mentioned in the readme.

On Tue, Jun 21, 2011 at 2:47 PM, Sasha Dolgy <sd...@gmail.com> wrote:

> Without getting overly complicated and long winded ... are there
> practical references / examples I can review that demonstrate the
> cassandra/solandra benefits....i had a quick look at
> https://github.com/tjake/Solandra/wiki/Solandra-Wiki and it wasn't
> dead obvious to me....
>
> On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani <ja...@gmail.com> wrote:
> > Solandra can answer the question you used as an example and it's more of
> a
> > fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
> > minutes not seconds.
> > On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy <sd...@gmail.com> wrote:
> >>
> >> Folks,
> >>
> >> Simple question ... Assuming my current use case is the ability to log
> >> lots of trivial and seemingly useless sports statistics ... I want a
> >> user to be able to query / compare .... For example:
> >>
> >> --> Show me all baseball players in cheektowaga and ontario,
> >> california who have hit a grandslam on tuesdays where it was just a
> >> leap year.
> >>
> >> Each baseball player is represented by a single row in a CF:
> >>
> >> player_uuid, fullname, hometown, game1, game2, game3, game4
> >>
> >> Game's are UUID's that are a reference to another row in the same CF
> >> that provides information about that game...
> >>
> >> location, final score, date (unix timestamp or ISO format) , and
> >> statitics which are represented as a new column timestamp:player_uuid
> >>
> >> I can use PIG, as I understand, to run a query to generate specific
> >> information about specific "things" and populate that data back into
> >> Cassandra in another CF ... similar to the hypothetical search
> >> above....as the information is structured already, i assume PIG is the
> >> right tool for the job, but may not be ideal for a web application and
> >> enabling ad-hoc queries ... it could take anywhere from 2-....?
> >> seconds for that query to generate, populate, and return to the
> >> user...?
> >>
> >> On the other hand, I have started to read about Solr / Solandra /
> >> Lucandra .... can this provide similar functionality or better ?  or
> >> is it more geared towards full text search and indexing ...
> >>
> >> I don't want to get into the habit of guessing what my potential users
> >> want to search for ... trying to think of ways to offload this to
> >> them.
> >>
> >>
> >>
> >> --
> >> Sasha Dolgy
> >> sasha.dolgy@gmail.com
> >
> >
> >
> > --
> > http://twitter.com/tjake
> >
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
>



-- 
http://twitter.com/tjake

Re: solandra or pig or....?

Posted by Santiago Basulto <sa...@gmail.com>.

Wouldn't it be useful to store your data somewhere structured
(Cassandra is obviously an option) and then use MapReduce to store
statistics?


2011/6/22 Jake Luciani <ja...@gmail.com>:
> Well solandra is running Cassandra so you can use Cassandra as you do today, but index some of the data in solr.
>
> On Jun 22, 2011, at 3:41 AM, Sasha Dolgy <sd...@gmail.com> wrote:
>
>> First, thanks everyone for the input.  Appreciate it.  The number
>> crunching would already have been completed, and all statistics per
>> game defined, and inserted into the appropriate CF/row/cols ...
>>
>> So, that being said, Solandra appears to be the right way to go ...
>> except, this would require that my current application(s) be rewritten
>> to consume Solandra and no longer Cassandra ... "Your application
>> isn't aware of Cassandra only Solr." or can I have the best of both
>> worlds?  Search is only one aspect of the consumer experience.  If a
>> consumer wanted to view a 'card' for a baseball player, all the
>> information would be retrieved directly from Cassandra to build that
>> card and search wouldn't be required...
>>
>> -sd
>>
>> On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani <ja...@gmail.com> wrote:
>>> Right,  Solr will not do anything other than basic aggregations (facets) and
>>> range queries.
>>> On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <da...@gmail.com>
>>> wrote:
>>>>
>>>> Solandra is indeed distributed search, not distributed number-crunching.
>>>>  As a previous poster said, you could imagine structuring the data in a
>>>> series of documents with fields containing playername, teamname, position,
>>>> location, day, time, inning, at bat, outcome, etc.  Then you could query to
>>>> get a slice of the data that matches your predicate and run statistics on
>>>> that subset.
>>>> The statistics would have to come from other code (eg. R), but solr will
>>>> filter it for you. So, this approach only works if the slices are reasonably
>>>> small, but gives you great granularity on search as long as you put all the
>>>> info in.  The users of this datastore (or you) must be willing to write
>>>> their own simple aggregation functions ("show me only the unique player
>>>> names returned by this solr query", "show me the average of field X returned
>>>> by this solr query", ...)
>>>> If the numbers of results are too great, MR may be the way to go.
>



-- 
Santiago Basulto.-

Re: solandra or pig or....?

Posted by Jake Luciani <ja...@gmail.com>.

Well solandra is running Cassandra so you can use Cassandra as you do today, but index some of the data in solr. 

On Jun 22, 2011, at 3:41 AM, Sasha Dolgy <sd...@gmail.com> wrote:

> First, thanks everyone for the input.  Appreciate it.  The number
> crunching would already have been completed, and all statistics per
> game defined, and inserted into the appropriate CF/row/cols ...
> 
> So, that being said, Solandra appears to be the right way to go ...
> except, this would require that my current application(s) be rewritten
> to consume Solandra and no longer Cassandra ... "Your application
> isn't aware of Cassandra only Solr." or can I have the best of both
> worlds?  Search is only one aspect of the consumer experience.  If a
> consumer wanted to view a 'card' for a baseball player, all the
> information would be retrieved directly from Cassandra to build that
> card and search wouldn't be required...
> 
> -sd
> 
> On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani <ja...@gmail.com> wrote:
>> Right,  Solr will not do anything other than basic aggregations (facets) and
>> range queries.
>> On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <da...@gmail.com>
>> wrote:
>>> 
>>> Solandra is indeed distributed search, not distributed number-crunching.
>>>  As a previous poster said, you could imagine structuring the data in a
>>> series of documents with fields containing playername, teamname, position,
>>> location, day, time, inning, at bat, outcome, etc.  Then you could query to
>>> get a slice of the data that matches your predicate and run statistics on
>>> that subset.
>>> The statistics would have to come from other code (eg. R), but solr will
>>> filter it for you. So, this approach only works if the slices are reasonably
>>> small, but gives you great granularity on search as long as you put all the
>>> info in.  The users of this datastore (or you) must be willing to write
>>> their own simple aggregation functions ("show me only the unique player
>>> names returned by this solr query", "show me the average of field X returned
>>> by this solr query", ...)
>>> If the numbers of results are too great, MR may be the way to go.

Re: solandra or pig or....?

Posted by Sasha Dolgy <sd...@gmail.com>.

First, thanks everyone for the input.  Appreciate it.  The number
crunching would already have been completed, and all statistics per
game defined, and inserted into the appropriate CF/row/cols ...

So, that being said, Solandra appears to be the right way to go ...
except, this would require that my current application(s) be rewritten
to consume Solandra and no longer Cassandra ... "Your application
isn't aware of Cassandra only Solr." or can I have the best of both
worlds?  Search is only one aspect of the consumer experience.  If a
consumer wanted to view a 'card' for a baseball player, all the
information would be retrieved directly from Cassandra to build that
card and search wouldn't be required...

-sd

On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani <ja...@gmail.com> wrote:
> Right,  Solr will not do anything other than basic aggregations (facets) and
> range queries.
> On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <da...@gmail.com>
> wrote:
>>
>> Solandra is indeed distributed search, not distributed number-crunching.
>>  As a previous poster said, you could imagine structuring the data in a
>> series of documents with fields containing playername, teamname, position,
>> location, day, time, inning, at bat, outcome, etc.  Then you could query to
>> get a slice of the data that matches your predicate and run statistics on
>> that subset.
>> The statistics would have to come from other code (eg. R), but solr will
>> filter it for you. So, this approach only works if the slices are reasonably
>> small, but gives you great granularity on search as long as you put all the
>> info in.  The users of this datastore (or you) must be willing to write
>> their own simple aggregation functions ("show me only the unique player
>> names returned by this solr query", "show me the average of field X returned
>> by this solr query", ...)
>> If the numbers of results are too great, MR may be the way to go.

Re: solandra or pig or....?

Posted by Jake Luciani <ja...@gmail.com>.

Right,  Solr will not do anything other than basic aggregations (facets) and
range queries.

On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich <da...@gmail.com>wrote:

> Solandra is indeed distributed search, not distributed number-crunching.
>  As a previous poster said, you could imagine structuring the data in a
> series of documents with fields containing playername, teamname, position,
> location, day, time, inning, at bat, outcome, etc.  Then you could query to
> get a slice of the data that matches your predicate and run statistics on
> that subset.
>
> The statistics would have to come from other code (eg. R), but solr will
> filter it for you. So, this approach only works if the slices are reasonably
> small, but gives you great granularity on search as long as you put all the
> info in.  The users of this datastore (or you) must be willing to write
> their own simple aggregation functions ("show me only the unique player
> names returned by this solr query", "show me the average of field X returned
> by this solr query", ...)
>
> If the numbers of results are too great, MR may be the way to go.
>
> On Tue, Jun 21, 2011 at 3:04 PM, Victor K. <vi...@gmail.com>wrote:
>
>> If I may ask Sasha, what exactly are you trying to achieve using SolR (or
>> Solandra, I guess it's about the same) ?
>> Because from what I understood of your problem you need to do statistics
>> on your matches, players etc... Or do you just want to retrieve information
>> that are already been computed ?
>> If it is the first thing you are trying to achieve (data aggregation,
>> statistics, etc...) SolR won't be of a big use because it is not meant to do
>> statistics. If you want to achieve the second then SolR is just the tool for
>> you.
>>
>>
>>
>> On 6/21/2011 2:47 PM, Sasha Dolgy wrote:
>>
>>> Without getting overly complicated and long winded ... are there
>>> practical references / examples I can review that demonstrate the
>>> cassandra/solandra benefits....i had a quick look at
>>> https://github.com/tjake/**Solandra/wiki/Solandra-Wiki<https://github.com/tjake/Solandra/wiki/Solandra-Wiki>and it wasn't
>>> dead obvious to me....
>>>
>>> On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani<ja...@gmail.com>  wrote:
>>>
>>>> Solandra can answer the question you used as an example and it's more of
>>>> a
>>>> fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
>>>> minutes not seconds.
>>>> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy<sd...@gmail.com>  wrote:
>>>>
>>>>> Folks,
>>>>>
>>>>> Simple question ... Assuming my current use case is the ability to log
>>>>> lots of trivial and seemingly useless sports statistics ... I want a
>>>>> user to be able to query / compare .... For example:
>>>>>
>>>>> -->  Show me all baseball players in cheektowaga and ontario,
>>>>> california who have hit a grandslam on tuesdays where it was just a
>>>>> leap year.
>>>>>
>>>>> Each baseball player is represented by a single row in a CF:
>>>>>
>>>>> player_uuid, fullname, hometown, game1, game2, game3, game4
>>>>>
>>>>> Game's are UUID's that are a reference to another row in the same CF
>>>>> that provides information about that game...
>>>>>
>>>>> location, final score, date (unix timestamp or ISO format) , and
>>>>> statitics which are represented as a new column timestamp:player_uuid
>>>>>
>>>>> I can use PIG, as I understand, to run a query to generate specific
>>>>> information about specific "things" and populate that data back into
>>>>> Cassandra in another CF ... similar to the hypothetical search
>>>>> above....as the information is structured already, i assume PIG is the
>>>>> right tool for the job, but may not be ideal for a web application and
>>>>> enabling ad-hoc queries ... it could take anywhere from 2-....?
>>>>> seconds for that query to generate, populate, and return to the
>>>>> user...?
>>>>>
>>>>> On the other hand, I have started to read about Solr / Solandra /
>>>>> Lucandra .... can this provide similar functionality or better ?  or
>>>>> is it more geared towards full text search and indexing ...
>>>>>
>>>>> I don't want to get into the habit of guessing what my potential users
>>>>> want to search for ... trying to think of ways to offload this to
>>>>> them.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sasha Dolgy
>>>>> sasha.dolgy@gmail.com
>>>>>
>>>>
>>>>
>>>> --
>>>> http://twitter.com/tjake
>>>>
>>>>
>>>
>>>
>>
>


-- 
http://twitter.com/tjake

Re: solandra or pig or....?

Posted by Dan Kuebrich <da...@gmail.com>.

Solandra is indeed distributed search, not distributed number-crunching.  As
a previous poster said, you could imagine structuring the data in a series
of documents with fields containing playername, teamname, position,
location, day, time, inning, at bat, outcome, etc.  Then you could query to
get a slice of the data that matches your predicate and run statistics on
that subset.

The statistics would have to come from other code (eg. R), but solr will
filter it for you. So, this approach only works if the slices are reasonably
small, but gives you great granularity on search as long as you put all the
info in.  The users of this datastore (or you) must be willing to write
their own simple aggregation functions ("show me only the unique player
names returned by this solr query", "show me the average of field X returned
by this solr query", ...)

If the numbers of results are too great, MR may be the way to go.

On Tue, Jun 21, 2011 at 3:04 PM, Victor K. <vi...@gmail.com>wrote:

> If I may ask Sasha, what exactly are you trying to achieve using SolR (or
> Solandra, I guess it's about the same) ?
> Because from what I understood of your problem you need to do statistics on
> your matches, players etc... Or do you just want to retrieve information
> that are already been computed ?
> If it is the first thing you are trying to achieve (data aggregation,
> statistics, etc...) SolR won't be of a big use because it is not meant to do
> statistics. If you want to achieve the second then SolR is just the tool for
> you.
>
>
>
> On 6/21/2011 2:47 PM, Sasha Dolgy wrote:
>
>> Without getting overly complicated and long winded ... are there
>> practical references / examples I can review that demonstrate the
>> cassandra/solandra benefits....i had a quick look at
>> https://github.com/tjake/**Solandra/wiki/Solandra-Wiki<https://github.com/tjake/Solandra/wiki/Solandra-Wiki>and it wasn't
>> dead obvious to me....
>>
>> On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani<ja...@gmail.com>  wrote:
>>
>>> Solandra can answer the question you used as an example and it's more of
>>> a
>>> fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
>>> minutes not seconds.
>>> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy<sd...@gmail.com>  wrote:
>>>
>>>> Folks,
>>>>
>>>> Simple question ... Assuming my current use case is the ability to log
>>>> lots of trivial and seemingly useless sports statistics ... I want a
>>>> user to be able to query / compare .... For example:
>>>>
>>>> -->  Show me all baseball players in cheektowaga and ontario,
>>>> california who have hit a grandslam on tuesdays where it was just a
>>>> leap year.
>>>>
>>>> Each baseball player is represented by a single row in a CF:
>>>>
>>>> player_uuid, fullname, hometown, game1, game2, game3, game4
>>>>
>>>> Game's are UUID's that are a reference to another row in the same CF
>>>> that provides information about that game...
>>>>
>>>> location, final score, date (unix timestamp or ISO format) , and
>>>> statitics which are represented as a new column timestamp:player_uuid
>>>>
>>>> I can use PIG, as I understand, to run a query to generate specific
>>>> information about specific "things" and populate that data back into
>>>> Cassandra in another CF ... similar to the hypothetical search
>>>> above....as the information is structured already, i assume PIG is the
>>>> right tool for the job, but may not be ideal for a web application and
>>>> enabling ad-hoc queries ... it could take anywhere from 2-....?
>>>> seconds for that query to generate, populate, and return to the
>>>> user...?
>>>>
>>>> On the other hand, I have started to read about Solr / Solandra /
>>>> Lucandra .... can this provide similar functionality or better ?  or
>>>> is it more geared towards full text search and indexing ...
>>>>
>>>> I don't want to get into the habit of guessing what my potential users
>>>> want to search for ... trying to think of ways to offload this to
>>>> them.
>>>>
>>>>
>>>>
>>>> --
>>>> Sasha Dolgy
>>>> sasha.dolgy@gmail.com
>>>>
>>>
>>>
>>> --
>>> http://twitter.com/tjake
>>>
>>>
>>
>>
>

Re: solandra or pig or....?

Posted by "Victor K." <vi...@gmail.com>.

If I may ask Sasha, what exactly are you trying to achieve using SolR 
(or Solandra, I guess it's about the same) ?
Because from what I understood of your problem you need to do statistics 
on your matches, players etc... Or do you just want to retrieve 
information that are already been computed ?
If it is the first thing you are trying to achieve (data aggregation, 
statistics, etc...) SolR won't be of a big use because it is not meant 
to do statistics. If you want to achieve the second then SolR is just 
the tool for you.


On 6/21/2011 2:47 PM, Sasha Dolgy wrote:
> Without getting overly complicated and long winded ... are there
> practical references / examples I can review that demonstrate the
> cassandra/solandra benefits....i had a quick look at
> https://github.com/tjake/Solandra/wiki/Solandra-Wiki and it wasn't
> dead obvious to me....
>
> On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani<ja...@gmail.com>  wrote:
>> Solandra can answer the question you used as an example and it's more of a
>> fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
>> minutes not seconds.
>> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy<sd...@gmail.com>  wrote:
>>> Folks,
>>>
>>> Simple question ... Assuming my current use case is the ability to log
>>> lots of trivial and seemingly useless sports statistics ... I want a
>>> user to be able to query / compare .... For example:
>>>
>>> -->  Show me all baseball players in cheektowaga and ontario,
>>> california who have hit a grandslam on tuesdays where it was just a
>>> leap year.
>>>
>>> Each baseball player is represented by a single row in a CF:
>>>
>>> player_uuid, fullname, hometown, game1, game2, game3, game4
>>>
>>> Game's are UUID's that are a reference to another row in the same CF
>>> that provides information about that game...
>>>
>>> location, final score, date (unix timestamp or ISO format) , and
>>> statitics which are represented as a new column timestamp:player_uuid
>>>
>>> I can use PIG, as I understand, to run a query to generate specific
>>> information about specific "things" and populate that data back into
>>> Cassandra in another CF ... similar to the hypothetical search
>>> above....as the information is structured already, i assume PIG is the
>>> right tool for the job, but may not be ideal for a web application and
>>> enabling ad-hoc queries ... it could take anywhere from 2-....?
>>> seconds for that query to generate, populate, and return to the
>>> user...?
>>>
>>> On the other hand, I have started to read about Solr / Solandra /
>>> Lucandra .... can this provide similar functionality or better ?  or
>>> is it more geared towards full text search and indexing ...
>>>
>>> I don't want to get into the habit of guessing what my potential users
>>> want to search for ... trying to think of ways to offload this to
>>> them.
>>>
>>>
>>>
>>> --
>>> Sasha Dolgy
>>> sasha.dolgy@gmail.com
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>

Re: solandra or pig or....?

Posted by Sasha Dolgy <sd...@gmail.com>.

Without getting overly complicated and long winded ... are there
practical references / examples I can review that demonstrate the
cassandra/solandra benefits....i had a quick look at
https://github.com/tjake/Solandra/wiki/Solandra-Wiki and it wasn't
dead obvious to me....

On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani <ja...@gmail.com> wrote:
> Solandra can answer the question you used as an example and it's more of a
> fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
> minutes not seconds.
> On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy <sd...@gmail.com> wrote:
>>
>> Folks,
>>
>> Simple question ... Assuming my current use case is the ability to log
>> lots of trivial and seemingly useless sports statistics ... I want a
>> user to be able to query / compare .... For example:
>>
>> --> Show me all baseball players in cheektowaga and ontario,
>> california who have hit a grandslam on tuesdays where it was just a
>> leap year.
>>
>> Each baseball player is represented by a single row in a CF:
>>
>> player_uuid, fullname, hometown, game1, game2, game3, game4
>>
>> Game's are UUID's that are a reference to another row in the same CF
>> that provides information about that game...
>>
>> location, final score, date (unix timestamp or ISO format) , and
>> statitics which are represented as a new column timestamp:player_uuid
>>
>> I can use PIG, as I understand, to run a query to generate specific
>> information about specific "things" and populate that data back into
>> Cassandra in another CF ... similar to the hypothetical search
>> above....as the information is structured already, i assume PIG is the
>> right tool for the job, but may not be ideal for a web application and
>> enabling ad-hoc queries ... it could take anywhere from 2-....?
>> seconds for that query to generate, populate, and return to the
>> user...?
>>
>> On the other hand, I have started to read about Solr / Solandra /
>> Lucandra .... can this provide similar functionality or better ?  or
>> is it more geared towards full text search and indexing ...
>>
>> I don't want to get into the habit of guessing what my potential users
>> want to search for ... trying to think of ways to offload this to
>> them.
>>
>>
>>
>> --
>> Sasha Dolgy
>> sasha.dolgy@gmail.com
>
>
>
> --
> http://twitter.com/tjake
>



-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: solandra or pig or....?

Posted by Jake Luciani <ja...@gmail.com>.

Solandra can answer the question you used as an example and it's more of a
fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
minutes not seconds.

On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy <sd...@gmail.com> wrote:

> Folks,
>
> Simple question ... Assuming my current use case is the ability to log
> lots of trivial and seemingly useless sports statistics ... I want a
> user to be able to query / compare .... For example:
>
> --> Show me all baseball players in cheektowaga and ontario,
> california who have hit a grandslam on tuesdays where it was just a
> leap year.
>
> Each baseball player is represented by a single row in a CF:
>
> player_uuid, fullname, hometown, game1, game2, game3, game4
>
> Game's are UUID's that are a reference to another row in the same CF
> that provides information about that game...
>
> location, final score, date (unix timestamp or ISO format) , and
> statitics which are represented as a new column timestamp:player_uuid
>
> I can use PIG, as I understand, to run a query to generate specific
> information about specific "things" and populate that data back into
> Cassandra in another CF ... similar to the hypothetical search
> above....as the information is structured already, i assume PIG is the
> right tool for the job, but may not be ideal for a web application and
> enabling ad-hoc queries ... it could take anywhere from 2-....?
> seconds for that query to generate, populate, and return to the
> user...?
>
> On the other hand, I have started to read about Solr / Solandra /
> Lucandra .... can this provide similar functionality or better ?  or
> is it more geared towards full text search and indexing ...
>
> I don't want to get into the habit of guessing what my potential users
> want to search for ... trying to think of ways to offload this to
> them.
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
>



-- 
http://twitter.com/tjake

Re: solandra or pig or....?

Posted by Victor Kabdebon <vi...@gmail.com>.

I can speak for what I know :

Pig I have taken only a quick look and maybe some guys from Twitter can
answer better than me on that particular program. Pig is not for "on demand"
queries: they are quite slow and as you said you extract relevant
information and append it to another CF where you can retrieve quickly the
statistics.

SolR is purely a search engine. It is not only text based but also time
based etc... To do statistics you need mathematical operations, statistics,
SolR won't provide that. It can do simple things in terms of statistics but
mostly it is a search engine.

Personally for what you are asking I would use Pig and stock that in CF. I
would update those CF regularly. For simple statistics you can generate them
with your favorite language or a specialized language such as R as long as
it concerns small sets.

Hope it helps,
Victor Kabdebon

2011/6/21 Sasha Dolgy <sd...@gmail.com>

> Folks,
>
> Simple question ... Assuming my current use case is the ability to log
> lots of trivial and seemingly useless sports statistics ... I want a
> user to be able to query / compare .... For example:
>
> --> Show me all baseball players in cheektowaga and ontario,
> california who have hit a grandslam on tuesdays where it was just a
> leap year.
>
> Each baseball player is represented by a single row in a CF:
>
> player_uuid, fullname, hometown, game1, game2, game3, game4
>
> Game's are UUID's that are a reference to another row in the same CF
> that provides information about that game...
>
> location, final score, date (unix timestamp or ISO format) , and
> statitics which are represented as a new column timestamp:player_uuid
>
> I can use PIG, as I understand, to run a query to generate specific
> information about specific "things" and populate that data back into
> Cassandra in another CF ... similar to the hypothetical search
> above....as the information is structured already, i assume PIG is the
> right tool for the job, but may not be ideal for a web application and
> enabling ad-hoc queries ... it could take anywhere from 2-....?
> seconds for that query to generate, populate, and return to the
> user...?
>
> On the other hand, I have started to read about Solr / Solandra /
> Lucandra .... can this provide similar functionality or better ?  or
> is it more geared towards full text search and indexing ...
>
> I don't want to get into the habit of guessing what my potential users
> want to search for ... trying to think of ways to offload this to
> them.
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
>