You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michel Nadeau <ak...@gmail.com> on 2010/04/01 03:17:27 UTC

Lucene Challenge - sum, count, avg, etc.

Hi,

We're currently in the process of switching many of our screens from MySQL
to Lucene because MySQL simply dies because we have too much data and it's
becoming too long to generate the stats we need.

So here's one MySQL query that we use to find out our Top 10 Affiliates :

SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
ORDER BY total_sales DESC LIMIT 10;

We currently have our "sales" index, containing all sales and all fields -
and it's one big index (over 10M records). We could fetch all documents
within the date range, loop them and add up the total_sales, but it would be
just crazy to do this all the time (we have a high volume of search).

We made several tests with Solr (Facets, and even the beta CollapseFields),
but nothing is really helping us. We could pre-generate the total_sales for
all possible date ranges... but that would be quite crazy too as the date
range possibilities quickly become endless.

So - is there any known way to efficiently do SUM(), COUNT() (and even AVG()
) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
to offer what I need either.

Thanks for any hints!!!

- Mike
akaris@gmail.com

Re: Lucene Challenge - sum, count, avg, etc.

Posted by Chris Lu <ch...@gmail.com>.
Hi, Michel,

You can use DBSight free version to test it out.
However, it's a whole solution since you will need to configure it 
first. Like specifying which column you want to do the counting before 
the actual search.

BTW: DBSight also support MIN and MAX, in addition to SUM,AVG.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!



Michel Nadeau wrote:
> @Ken: yeah we thought about it - but we have a HUGE amount of data (sales,
> affiliates, etc.) - so pre-calculating everything isn't really an option.
> Plus I don't know how we would sort.. let's say I get the totals for
> affiliate X, loop totals from day 1 to X (range), sum up, great: I can do
> this for all affiliates and have the totals, but how will I sort by that
> total?
>
> @Chris: we don't want to switch to a whole new platform - is it possible to
> use that DBSight module only? It doesn't seem to be an opensource project so
> I can't really consider it.
>
> - Mike
> akaris@gmail.com
>
>
> On Thu, Apr 1, 2010 at 5:00 AM, Chris Lu <ch...@gmail.com> wrote:
>
>   
>> Hi, Michel,
>>
>> This has already been implemented in DBSight. Check it out!
>> http://www.dbsight.net
>>
>> You can get sum, avg for Facet searches. And count is included in Facet
>> search directly.
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>>
>>
>> Michel Nadeau wrote:
>>
>>     
>>> Hi,
>>>
>>> We're currently in the process of switching many of our screens from MySQL
>>> to Lucene because MySQL simply dies because we have too much data and it's
>>> becoming too long to generate the stats we need.
>>>
>>> So here's one MySQL query that we use to find out our Top 10 Affiliates :
>>>
>>> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>>> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
>>> ORDER BY total_sales DESC LIMIT 10;
>>>
>>> We currently have our "sales" index, containing all sales and all fields -
>>> and it's one big index (over 10M records). We could fetch all documents
>>> within the date range, loop them and add up the total_sales, but it would
>>> be
>>> just crazy to do this all the time (we have a high volume of search).
>>>
>>> We made several tests with Solr (Facets, and even the beta
>>> CollapseFields),
>>> but nothing is really helping us. We could pre-generate the total_sales
>>> for
>>> all possible date ranges... but that would be quite crazy too as the date
>>> range possibilities quickly become endless.
>>>
>>> So - is there any known way to efficiently do SUM(), COUNT() (and even
>>> AVG()
>>> ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
>>> to offer what I need either.
>>>
>>> Thanks for any hints!!!
>>>
>>> - Mike
>>> akaris@gmail.com
>>>
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   

Re: Lucene Challenge - sum, count, avg, etc.

Posted by Michel Nadeau <ak...@gmail.com>.
@Ken: yeah we thought about it - but we have a HUGE amount of data (sales,
affiliates, etc.) - so pre-calculating everything isn't really an option.
Plus I don't know how we would sort.. let's say I get the totals for
affiliate X, loop totals from day 1 to X (range), sum up, great: I can do
this for all affiliates and have the totals, but how will I sort by that
total?

@Chris: we don't want to switch to a whole new platform - is it possible to
use that DBSight module only? It doesn't seem to be an opensource project so
I can't really consider it.

- Mike
akaris@gmail.com


On Thu, Apr 1, 2010 at 5:00 AM, Chris Lu <ch...@gmail.com> wrote:

> Hi, Michel,
>
> This has already been implemented in DBSight. Check it out!
> http://www.dbsight.net
>
> You can get sum, avg for Facet searches. And count is included in Facet
> search directly.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
>
>
> Michel Nadeau wrote:
>
>> Hi,
>>
>> We're currently in the process of switching many of our screens from MySQL
>> to Lucene because MySQL simply dies because we have too much data and it's
>> becoming too long to generate the stats we need.
>>
>> So here's one MySQL query that we use to find out our Top 10 Affiliates :
>>
>> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
>> ORDER BY total_sales DESC LIMIT 10;
>>
>> We currently have our "sales" index, containing all sales and all fields -
>> and it's one big index (over 10M records). We could fetch all documents
>> within the date range, loop them and add up the total_sales, but it would
>> be
>> just crazy to do this all the time (we have a high volume of search).
>>
>> We made several tests with Solr (Facets, and even the beta
>> CollapseFields),
>> but nothing is really helping us. We could pre-generate the total_sales
>> for
>> all possible date ranges... but that would be quite crazy too as the date
>> range possibilities quickly become endless.
>>
>> So - is there any known way to efficiently do SUM(), COUNT() (and even
>> AVG()
>> ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
>> to offer what I need either.
>>
>> Thanks for any hints!!!
>>
>> - Mike
>> akaris@gmail.com
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene Challenge - sum, count, avg, etc.

Posted by Chris Lu <ch...@gmail.com>.
Hi, Michel,

This has already been implemented in DBSight. Check it out!
http://www.dbsight.net

You can get sum, avg for Facet searches. And count is included in Facet 
search directly.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!



Michel Nadeau wrote:
> Hi,
>
> We're currently in the process of switching many of our screens from MySQL
> to Lucene because MySQL simply dies because we have too much data and it's
> becoming too long to generate the stats we need.
>
> So here's one MySQL query that we use to find out our Top 10 Affiliates :
>
> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
> ORDER BY total_sales DESC LIMIT 10;
>
> We currently have our "sales" index, containing all sales and all fields -
> and it's one big index (over 10M records). We could fetch all documents
> within the date range, loop them and add up the total_sales, but it would be
> just crazy to do this all the time (we have a high volume of search).
>
> We made several tests with Solr (Facets, and even the beta CollapseFields),
> but nothing is really helping us. We could pre-generate the total_sales for
> all possible date ranges... but that would be quite crazy too as the date
> range possibilities quickly become endless.
>
> So - is there any known way to efficiently do SUM(), COUNT() (and even AVG()
> ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
> to offer what I need either.
>
> Thanks for any hints!!!
>
> - Mike
> akaris@gmail.com
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Grant Ingersoll <gs...@apache.org>.
Have you looked at Solr's StatsComponent?

On Mar 31, 2010, at 9:17 PM, Michel Nadeau wrote:

> Hi,
> 
> We're currently in the process of switching many of our screens from MySQL
> to Lucene because MySQL simply dies because we have too much data and it's
> becoming too long to generate the stats we need.
> 
> So here's one MySQL query that we use to find out our Top 10 Affiliates :
> 
> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
> ORDER BY total_sales DESC LIMIT 10;
> 
> We currently have our "sales" index, containing all sales and all fields -
> and it's one big index (over 10M records). We could fetch all documents
> within the date range, loop them and add up the total_sales, but it would be
> just crazy to do this all the time (we have a high volume of search).
> 
> We made several tests with Solr (Facets, and even the beta CollapseFields),
> but nothing is really helping us. We could pre-generate the total_sales for
> all possible date ranges... but that would be quite crazy too as the date
> range possibilities quickly become endless.
> 
> So - is there any known way to efficiently do SUM(), COUNT() (and even AVG()
> ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
> to offer what I need either.
> 
> Thanks for any hints!!!
> 
> - Mike
> akaris@gmail.com

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Will Johnson <wi...@gmail.com>.
Hi Michel,



You can do all of this with Lucene however not with a standard index/query
operators.  At Attivio we have a custom Lucene index structure + custom
query operators that support relational joins across records in an index.  You
can write the queries in our standard query language or run actual SQL.  All
of this is done without pre-computing or flattening records as that prevents
you from having query flexibility at runtime, ie what happens when you want
to know something that isn’t pre-computed/pre-flattened?



If you look at the demo at the bottom of this page
http://www.attivio.com/active-intelligence/aie-demo.html  you can see how we
index and query against both news articles and baseball statistics from a
relational database.  For example you can do something like this with the
baseball data:



select sum(RBI), teamID, yearID

from master m join batting b

on m."playerID" = b.playerID

where b.yearID > '2004'

group by yearID,teamID

order by yearID,teamID



We support min, max, avg and a number of other aggregate functions along
with true full-text search.  Another article you might check out is here:
http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-replace-a-relational-database.html
.  So far we're getting some pretty good results competing with databases
and data warehouses on raw performance (at customer sites) even without the
full text search capabilities mixed in.  Once you start adding in 'fuzzy'
joins, relevancy, proximity and all the other boolean query logic, we start
to pull ahead even further.



If you want to learn more drop me a line.  We'll be demonstrating all this
stuff (and more) at Enterprise Search Summit (ESS) in New York this coming
May.



- will@attivio.com

Re: Lucene Challenge - sum, count, avg, etc.

Posted by prasenjit mukherjee <pr...@gmail.com>.
> Lucene is great at searching for data, but just because it is awesome in one area doesn't mean it would excel in something it wasn't designed for ;-)

I think lucene is probably one of the better data structures for
computing "conditional aggregated stats". Even for straight search
lucene has to iterate over all the matches and compute the score which
is probably much more computation intensive than doing simple
aggregations ( like SUM,COUNT,AVG,MIN,MAX etc. ). Makes sense for a
contrib project.

I have a feeling that lucene will be faster than MySQL for these kind
of conditional aggregation tasks. But would reserve my comment and
would appreciate expert advice on this.

Having said that I think you are right that lucene shouldn't be used
as a general purpose Analytical tool.

-Prasen


On Thu, Apr 1, 2010 at 8:51 AM, Darren Hartford <dh...@ghsinc.com> wrote:
> If you are going to end up either copying or moving all the data to lucene (which, when you hook up lucene even to the existing mysql data, it will still create it's own copy of the data), you might really want to look at other options:
>
> *column oriented databases (analytical databases).  If open source is a concern: LucidDB, MonetDB, mysql/InfoBright (GPL/commercial), and I think mysql/InfiniDB may be getting into it.  And, you can still use your normal SQL tools with better performance for this particular scenario.
>
> Search engine is about searching for data....trying to turn it into a (financial) analytical tool may get you more trouble particularly if you haven't looked at other options.
>
> Lucene is great at searching for data, but just because it is awesome in one area doesn't mean it would excel in something it wasn't designed for ;-)
>
> -D
>
>
> -----Original Message-----
> From: prasenjit mukherjee [mailto:prasen.bea@gmail.com]
> Sent: Thursday, April 01, 2010 8:11 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Challenge - sum, count, avg, etc.
>
> Not sure what you mean by "joining" in lucene , since conceptually
> there is only 1 table ( with many field aka columns ) in lucene. A
> representative query would be good to know the use case.
>
> Again didn't get the "sorting" part.  SUM() will return only 1
> aggregated value, so what do you want to sort it on ?
>
> -Prasen
>
> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <ak...@gmail.com> wrote:
>> Are you planning to be able to sort by these SUMs? A SpanQuery would work
>> great to get the integers... then you would loop and sum up... but what
>> about "joining" with your other data and sorting?
>>
>> - Mike
>> akaris@gmail.com
>>
>>
>> On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
>> <pr...@gmail.com>wrote:
>>
>>> I too am trying to achieve something.
>>>
>>> I am thinking of storing the integer values in  payloads and then
>>> using spanquery classes to compute the respective SUMs
>>>
>>> -Prasen
>>>
>>> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > We're currently in the process of switching many of our screens from
>>> MySQL
>>> > to Lucene because MySQL simply dies because we have too much data and
>>> it's
>>> > becoming too long to generate the stats we need.
>>> >
>>> > So here's one MySQL query that we use to find out our Top 10 Affiliates :
>>> >
>>> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>>> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
>>> affialiate_id
>>> > ORDER BY total_sales DESC LIMIT 10;
>>> >
>>> > We currently have our "sales" index, containing all sales and all fields
>>> -
>>> > and it's one big index (over 10M records). We could fetch all documents
>>> > within the date range, loop them and add up the total_sales, but it would
>>> be
>>> > just crazy to do this all the time (we have a high volume of search).
>>> >
>>> > We made several tests with Solr (Facets, and even the beta
>>> CollapseFields),
>>> > but nothing is really helping us. We could pre-generate the total_sales
>>> for
>>> > all possible date ranges... but that would be quite crazy too as the date
>>> > range possibilities quickly become endless.
>>> >
>>> > So - is there any known way to efficiently do SUM(), COUNT() (and even
>>> AVG()
>>> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
>>> seem
>>> > to offer what I need either.
>>> >
>>> > Thanks for any hints!!!
>>> >
>>> > - Mike
>>> > akaris@gmail.com
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Challenge - sum, count, avg, etc.

Posted by Darren Hartford <dh...@ghsinc.com>.
If you are going to end up either copying or moving all the data to lucene (which, when you hook up lucene even to the existing mysql data, it will still create it's own copy of the data), you might really want to look at other options:

*column oriented databases (analytical databases).  If open source is a concern: LucidDB, MonetDB, mysql/InfoBright (GPL/commercial), and I think mysql/InfiniDB may be getting into it.  And, you can still use your normal SQL tools with better performance for this particular scenario.

Search engine is about searching for data....trying to turn it into a (financial) analytical tool may get you more trouble particularly if you haven't looked at other options.  

Lucene is great at searching for data, but just because it is awesome in one area doesn't mean it would excel in something it wasn't designed for ;-)

-D


-----Original Message-----
From: prasenjit mukherjee [mailto:prasen.bea@gmail.com] 
Sent: Thursday, April 01, 2010 8:11 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene Challenge - sum, count, avg, etc.

Not sure what you mean by "joining" in lucene , since conceptually
there is only 1 table ( with many field aka columns ) in lucene. A
representative query would be good to know the use case.

Again didn't get the "sorting" part.  SUM() will return only 1
aggregated value, so what do you want to sort it on ?

-Prasen

On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <ak...@gmail.com> wrote:
> Are you planning to be able to sort by these SUMs? A SpanQuery would work
> great to get the integers... then you would loop and sum up... but what
> about "joining" with your other data and sorting?
>
> - Mike
> akaris@gmail.com
>
>
> On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
> <pr...@gmail.com>wrote:
>
>> I too am trying to achieve something.
>>
>> I am thinking of storing the integer values in  payloads and then
>> using spanquery classes to compute the respective SUMs
>>
>> -Prasen
>>
>> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
>> > Hi,
>> >
>> > We're currently in the process of switching many of our screens from
>> MySQL
>> > to Lucene because MySQL simply dies because we have too much data and
>> it's
>> > becoming too long to generate the stats we need.
>> >
>> > So here's one MySQL query that we use to find out our Top 10 Affiliates :
>> >
>> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
>> affialiate_id
>> > ORDER BY total_sales DESC LIMIT 10;
>> >
>> > We currently have our "sales" index, containing all sales and all fields
>> -
>> > and it's one big index (over 10M records). We could fetch all documents
>> > within the date range, loop them and add up the total_sales, but it would
>> be
>> > just crazy to do this all the time (we have a high volume of search).
>> >
>> > We made several tests with Solr (Facets, and even the beta
>> CollapseFields),
>> > but nothing is really helping us. We could pre-generate the total_sales
>> for
>> > all possible date ranges... but that would be quite crazy too as the date
>> > range possibilities quickly become endless.
>> >
>> > So - is there any known way to efficiently do SUM(), COUNT() (and even
>> AVG()
>> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
>> seem
>> > to offer what I need either.
>> >
>> > Thanks for any hints!!!
>> >
>> > - Mike
>> > akaris@gmail.com
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Grant Ingersoll <gs...@apache.org>.
On Apr 1, 2010, at 11:13 PM, Michel Nadeau wrote:

> My big question is how do you loop 1M records, sum up field(s), and then
> sort on that field... all in memory (could use too much ram) ?  In a
> temporary index (could take a while to re-write a lot of documents in a new
> index) ?
> 

You're going to need to do some combination of Collector, FieldCache and the existing Sort code.  Doesn't Solr's StatsComponent work for this?


-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Michel Nadeau <ak...@gmail.com>.
My big question is how do you loop 1M records, sum up field(s), and then
sort on that field... all in memory (could use too much ram) ?  In a
temporary index (could take a while to re-write a lot of documents in a new
index) ?

- Mike
akaris@gmail.com


On Thu, Apr 1, 2010 at 5:31 PM, Chris Lu <ch...@gmail.com> wrote:

> Thanks. Not really trying to sell DBSight here since most people here are
> Lucene experts.
> Just to confirm that this "challenge" has been done via Lucene for quite a
> while.
>
> The technique for it is very similar to how facet search is done, which has
> several ways also.
> Million's of rows are not really "that" big when everything is properly
> warmed up.
>
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
>
> Michel Nadeau wrote:
>
>> I'm sure the DBSight feature is great, but we already have a system in
>> place
>> and we're not throwing it away -- it's closely integrated with our whole
>> platform. We're way past the point to switch our solution to DBSight.
>>  We'd
>> be more than happy to use the DBSight feature if it would be opensource
>> but
>> unfortunately it's not - so we won't even consider it.
>>
>> Chris: are you a developer at DBSight? Can you tell us more about how it
>> works?  Because I don't really see how it can be "fast" when dealing with
>> millions of records... as it has to loop through them, compute, store
>> everything (in a temp index? memory?) and then re-sort.
>>
>> - Mike
>> akaris@gmail.com
>>
>>
>> On Thu, Apr 1, 2010 at 5:02 PM, Chris Lu <ch...@gmail.com> wrote:
>>
>>
>>
>>> For DBSight, the aggregated values are computed during run time.
>>> And the sorting on the computed aggregated values are done when
>>> displaying
>>> the results.
>>>
>>> The idea is, after the aggregation, the number of aggregated values are
>>> much much smaller.
>>>
>>>
>>> --
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes:
>>>
>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>> 2.6 Million Euro funding!
>>>
>>>
>>> prasenjit mukherjee wrote:
>>>
>>>
>>>
>>>> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <ch...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> No need for Hadoop. It's even more slower. Lucene can do it easily.
>>>>>
>>>>> This has been implemented in DBSight.
>>>>> The implementation is very similar to Facet search. Just need a way to
>>>>> load
>>>>> the field quickly, like put it in memory or some data structure, and
>>>>> count
>>>>> the sum/min/max during searching.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> This will ONLY compute the aggregated value ( sum,count,min,max etc.
>>>> ). I guess what Mike wants is use the aggregated value to sort the
>>>> entries. Dynamically maintaining a sorted list while searching could
>>>> be extremely expensive.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> --
>>>>> Chris Lu
>>>>> -------------------------
>>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>>> site: http://www.dbsight.net
>>>>> demo: http://search.dbsight.com
>>>>> Lucene Database Search in 3 minutes:
>>>>>
>>>>>
>>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>>> DBSight customer, a shopping comparison site, (anonymous per request)
>>>>> got
>>>>> 2.6 Million Euro funding!
>>>>>
>>>>>
>>>>> prasenjit mukherjee wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>>>>>
>>>>>> It could be difficult for lucene to do sort and sum simultaneously as
>>>>>> sorting itself depends upon summed value.
>>>>>>
>>>>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Well that's my problem: we have a lot of records of all types
>>>>>>> (afiiliates,
>>>>>>> sales) so looping tons of records each time isn't possible.
>>>>>>>
>>>>>>> - Mike
>>>>>>> akaris@gmail.com
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>>>>>> <pr...@gmail.com>wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Lucene Challenge - sum, count, avg, etc.

Posted by Chris Lu <ch...@gmail.com>.
Thanks. Not really trying to sell DBSight here since most people here 
are Lucene experts.
Just to confirm that this "challenge" has been done via Lucene for quite 
a while.

The technique for it is very similar to how facet search is done, which 
has several ways also.
Million's of rows are not really "that" big when everything is properly 
warmed up.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!



Michel Nadeau wrote:
> I'm sure the DBSight feature is great, but we already have a system in place
> and we're not throwing it away -- it's closely integrated with our whole
> platform. We're way past the point to switch our solution to DBSight.  We'd
> be more than happy to use the DBSight feature if it would be opensource but
> unfortunately it's not - so we won't even consider it.
>
> Chris: are you a developer at DBSight? Can you tell us more about how it
> works?  Because I don't really see how it can be "fast" when dealing with
> millions of records... as it has to loop through them, compute, store
> everything (in a temp index? memory?) and then re-sort.
>
> - Mike
> akaris@gmail.com
>
>
> On Thu, Apr 1, 2010 at 5:02 PM, Chris Lu <ch...@gmail.com> wrote:
>
>   
>> For DBSight, the aggregated values are computed during run time.
>> And the sorting on the computed aggregated values are done when displaying
>> the results.
>>
>> The idea is, after the aggregation, the number of aggregated values are
>> much much smaller.
>>
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>> prasenjit mukherjee wrote:
>>
>>     
>>> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <ch...@gmail.com> wrote:
>>>
>>>
>>>       
>>>> No need for Hadoop. It's even more slower. Lucene can do it easily.
>>>>
>>>> This has been implemented in DBSight.
>>>> The implementation is very similar to Facet search. Just need a way to
>>>> load
>>>> the field quickly, like put it in memory or some data structure, and
>>>> count
>>>> the sum/min/max during searching.
>>>>
>>>>
>>>>         
>>> This will ONLY compute the aggregated value ( sum,count,min,max etc.
>>> ). I guess what Mike wants is use the aggregated value to sort the
>>> entries. Dynamically maintaining a sorted list while searching could
>>> be extremely expensive.
>>>
>>>
>>>
>>>
>>>       
>>>> --
>>>> Chris Lu
>>>> -------------------------
>>>> Instant Scalable Full-Text Search On Any Database/Application
>>>> site: http://www.dbsight.net
>>>> demo: http://search.dbsight.com
>>>> Lucene Database Search in 3 minutes:
>>>>
>>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>>> 2.6 Million Euro funding!
>>>>
>>>>
>>>> prasenjit mukherjee wrote:
>>>>
>>>>
>>>>         
>>>>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>>>>
>>>>> It could be difficult for lucene to do sort and sum simultaneously as
>>>>> sorting itself depends upon summed value.
>>>>>
>>>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Well that's my problem: we have a lot of records of all types
>>>>>> (afiiliates,
>>>>>> sales) so looping tons of records each time isn't possible.
>>>>>>
>>>>>> - Mike
>>>>>> akaris@gmail.com
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>>>>> <pr...@gmail.com>wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>       
>
>   

Re: Lucene Challenge - sum, count, avg, etc.

Posted by Michel Nadeau <ak...@gmail.com>.
I'm sure the DBSight feature is great, but we already have a system in place
and we're not throwing it away -- it's closely integrated with our whole
platform. We're way past the point to switch our solution to DBSight.  We'd
be more than happy to use the DBSight feature if it would be opensource but
unfortunately it's not - so we won't even consider it.

Chris: are you a developer at DBSight? Can you tell us more about how it
works?  Because I don't really see how it can be "fast" when dealing with
millions of records... as it has to loop through them, compute, store
everything (in a temp index? memory?) and then re-sort.

- Mike
akaris@gmail.com


On Thu, Apr 1, 2010 at 5:02 PM, Chris Lu <ch...@gmail.com> wrote:

> For DBSight, the aggregated values are computed during run time.
> And the sorting on the computed aggregated values are done when displaying
> the results.
>
> The idea is, after the aggregation, the number of aggregated values are
> much much smaller.
>
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> prasenjit mukherjee wrote:
>
>> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <ch...@gmail.com> wrote:
>>
>>
>>> No need for Hadoop. It's even more slower. Lucene can do it easily.
>>>
>>> This has been implemented in DBSight.
>>> The implementation is very similar to Facet search. Just need a way to
>>> load
>>> the field quickly, like put it in memory or some data structure, and
>>> count
>>> the sum/min/max during searching.
>>>
>>>
>>
>> This will ONLY compute the aggregated value ( sum,count,min,max etc.
>> ). I guess what Mike wants is use the aggregated value to sort the
>> entries. Dynamically maintaining a sorted list while searching could
>> be extremely expensive.
>>
>>
>>
>>
>>> --
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site: http://www.dbsight.net
>>> demo: http://search.dbsight.com
>>> Lucene Database Search in 3 minutes:
>>>
>>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>> 2.6 Million Euro funding!
>>>
>>>
>>> prasenjit mukherjee wrote:
>>>
>>>
>>>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>>>
>>>> It could be difficult for lucene to do sort and sum simultaneously as
>>>> sorting itself depends upon summed value.
>>>>
>>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> Well that's my problem: we have a lot of records of all types
>>>>> (afiiliates,
>>>>> sales) so looping tons of records each time isn't possible.
>>>>>
>>>>> - Mike
>>>>> akaris@gmail.com
>>>>>
>>>>>
>>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>>>> <pr...@gmail.com>wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>

Re: Lucene Challenge - sum, count, avg, etc.

Posted by Chris Lu <ch...@gmail.com>.
For DBSight, the aggregated values are computed during run time.
And the sorting on the computed aggregated values are done when 
displaying the results.

The idea is, after the aggregation, the number of aggregated values are 
much much smaller.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!


prasenjit mukherjee wrote:
> On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <ch...@gmail.com> wrote:
>   
>> No need for Hadoop. It's even more slower. Lucene can do it easily.
>>
>> This has been implemented in DBSight.
>> The implementation is very similar to Facet search. Just need a way to load
>> the field quickly, like put it in memory or some data structure, and count
>> the sum/min/max during searching.
>>     
>
> This will ONLY compute the aggregated value ( sum,count,min,max etc.
> ). I guess what Mike wants is use the aggregated value to sort the
> entries. Dynamically maintaining a sorted list while searching could
> be extremely expensive.
>
>
>   
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per request) got
>> 2.6 Million Euro funding!
>>
>>
>> prasenjit mukherjee wrote:
>>     
>>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>>
>>> It could be difficult for lucene to do sort and sum simultaneously as
>>> sorting itself depends upon summed value.
>>>
>>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com> wrote:
>>>
>>>       
>>>> Well that's my problem: we have a lot of records of all types
>>>> (afiiliates,
>>>> sales) so looping tons of records each time isn't possible.
>>>>
>>>> - Mike
>>>> akaris@gmail.com
>>>>
>>>>
>>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>>> <pr...@gmail.com>wrote:
>>>>
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   

Re: Lucene Challenge - sum, count, avg, etc.

Posted by prasenjit mukherjee <pr...@gmail.com>.
On Fri, Apr 2, 2010 at 12:54 AM, Chris Lu <ch...@gmail.com> wrote:
> No need for Hadoop. It's even more slower. Lucene can do it easily.
>
> This has been implemented in DBSight.
> The implementation is very similar to Facet search. Just need a way to load
> the field quickly, like put it in memory or some data structure, and count
> the sum/min/max during searching.

This will ONLY compute the aggregated value ( sum,count,min,max etc.
). I guess what Mike wants is use the aggregated value to sort the
entries. Dynamically maintaining a sorted list while searching could
be extremely expensive.


>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> prasenjit mukherjee wrote:
>>
>> This looks like a use case more suited  for Pig ( over Hadoop ).
>>
>> It could be difficult for lucene to do sort and sum simultaneously as
>> sorting itself depends upon summed value.
>>
>> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com> wrote:
>>
>>>
>>> Well that's my problem: we have a lot of records of all types
>>> (afiiliates,
>>> sales) so looping tons of records each time isn't possible.
>>>
>>> - Mike
>>> akaris@gmail.com
>>>
>>>
>>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
>>> <pr...@gmail.com>wrote:
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Chris Lu <ch...@gmail.com>.
No need for Hadoop. It's even more slower. Lucene can do it easily.

This has been implemented in DBSight.
The implementation is very similar to Facet search. Just need a way to 
load the field quickly, like put it in memory or some data structure, 
and count the sum/min/max during searching.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding!


prasenjit mukherjee wrote:
> This looks like a use case more suited  for Pig ( over Hadoop ).
>
> It could be difficult for lucene to do sort and sum simultaneously as
> sorting itself depends upon summed value.
>
> On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com> wrote:
>   
>> Well that's my problem: we have a lot of records of all types (afiiliates,
>> sales) so looping tons of records each time isn't possible.
>>
>> - Mike
>> akaris@gmail.com
>>
>>
>> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee <pr...@gmail.com>wrote:
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   

Re: Lucene Challenge - sum, count, avg, etc.

Posted by prasenjit mukherjee <pr...@gmail.com>.
This looks like a use case more suited  for Pig ( over Hadoop ).

It could be difficult for lucene to do sort and sum simultaneously as
sorting itself depends upon summed value.

On Thu, Apr 1, 2010 at 11:47 PM, Michel Nadeau <ak...@gmail.com> wrote:
> Well that's my problem: we have a lot of records of all types (afiiliates,
> sales) so looping tons of records each time isn't possible.
>
> - Mike
> akaris@gmail.com
>
>
> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee <pr...@gmail.com>wrote:
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Michel Nadeau <ak...@gmail.com>.
Well that's my problem: we have a lot of records of all types (afiiliates,
sales) so looping tons of records each time isn't possible.

- Mike
akaris@gmail.com


On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee <pr...@gmail.com>wrote:

> If the number of documents ( in this case "Affiliates" )  aren't huge,
> sorting can probably be done as a post-process.
>
> Still dont see any need of joins here.
>
>
> On Thu, Apr 1, 2010 at 7:16 PM, Michel Nadeau <ak...@gmail.com> wrote:
> > Hi,
> >
> > Here's an example of raw data that would be in my Sales index:
> >
> > *Affiliate / SaleDate / SaleAmount*
> > * mike / 2010-03-01 / 10.00
> > * john / 2010-03-01 / 10.00
> > * mike / 2010-03-02 / 15.00
> > * john / 2010-03-02 /  5.00
> > * mike / 2010-03-03 / 20.00
> > * john / 2010-03-03 /  1.00
> > * mike / 2010-03-04 / 10.00
> > * john / 2010-03-04 / 10.00
> > * mike / 2010-03-05 / 15.00
> > * john / 2010-03-05 /  5.00
> > * mike / 2010-03-06 / 20.00
> > * john / 2010-03-06 /  1.00
> >
> > So our 2 affiliates mike and john made 6 sales each between 2010-03-01
> and
> > 2010-03-06. My ultimate query should return this for a query between
> > 2010-03-01 and 2010-03-06 :
> >
> > *Affiliate / TotalSales*
> > * mike / 90.00
> > * john / 32.00
> >
> > So it's exactly like -
> >
> > SELECT Affiliate, sum(SaleAmount) as TotalSales FROM Sales
> >  WHERE SaleDate >= '2010-03-01' AND SaleDate <= '2010-03-06'
> >  GROUP BY Affiliate
> >  ORDER BY TotalSales DESC;
> >
> > - Mike
> > akaris@gmail.com
> >
> >
> > On Thu, Apr 1, 2010 at 8:11 AM, prasenjit mukherjee <
> prasen.bea@gmail.com>wrote:
> >
> >> Not sure what you mean by "joining" in lucene , since conceptually
> >> there is only 1 table ( with many field aka columns ) in lucene. A
> >> representative query would be good to know the use case.
> >>
> >> Again didn't get the "sorting" part.  SUM() will return only 1
> >> aggregated value, so what do you want to sort it on ?
> >>
> >> -Prasen
> >>
> >> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <ak...@gmail.com> wrote:
> >> > Are you planning to be able to sort by these SUMs? A SpanQuery would
> work
> >> > great to get the integers... then you would loop and sum up... but
> what
> >> > about "joining" with your other data and sorting?
> >> >
> >> > - Mike
> >> > akaris@gmail.com
> >> >
> >> >
> >> > On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
> >> > <pr...@gmail.com>wrote:
> >> >
> >> >> I too am trying to achieve something.
> >> >>
> >> >> I am thinking of storing the integer values in  payloads and then
> >> >> using spanquery classes to compute the respective SUMs
> >> >>
> >> >> -Prasen
> >> >>
> >> >> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com>
> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > We're currently in the process of switching many of our screens
> from
> >> >> MySQL
> >> >> > to Lucene because MySQL simply dies because we have too much data
> and
> >> >> it's
> >> >> > becoming too long to generate the stats we need.
> >> >> >
> >> >> > So here's one MySQL query that we use to find out our Top 10
> >> Affiliates :
> >> >> >
> >> >> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales
> WHERE
> >> >> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
> >> >> affialiate_id
> >> >> > ORDER BY total_sales DESC LIMIT 10;
> >> >> >
> >> >> > We currently have our "sales" index, containing all sales and all
> >> fields
> >> >> -
> >> >> > and it's one big index (over 10M records). We could fetch all
> >> documents
> >> >> > within the date range, loop them and add up the total_sales, but it
> >> would
> >> >> be
> >> >> > just crazy to do this all the time (we have a high volume of
> search).
> >> >> >
> >> >> > We made several tests with Solr (Facets, and even the beta
> >> >> CollapseFields),
> >> >> > but nothing is really helping us. We could pre-generate the
> >> total_sales
> >> >> for
> >> >> > all possible date ranges... but that would be quite crazy too as
> the
> >> date
> >> >> > range possibilities quickly become endless.
> >> >> >
> >> >> > So - is there any known way to efficiently do SUM(), COUNT() (and
> even
> >> >> AVG()
> >> >> > ) using Lucene/Solr/others? I also checked Bobo Browse but it
> doesn't
> >> >> seem
> >> >> > to offer what I need either.
> >> >> >
> >> >> > Thanks for any hints!!!
> >> >> >
> >> >> > - Mike
> >> >> > akaris@gmail.com
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene Challenge - sum, count, avg, etc.

Posted by prasenjit mukherjee <pr...@gmail.com>.
If the number of documents ( in this case "Affiliates" )  aren't huge,
sorting can probably be done as a post-process.

Still dont see any need of joins here.


On Thu, Apr 1, 2010 at 7:16 PM, Michel Nadeau <ak...@gmail.com> wrote:
> Hi,
>
> Here's an example of raw data that would be in my Sales index:
>
> *Affiliate / SaleDate / SaleAmount*
> * mike / 2010-03-01 / 10.00
> * john / 2010-03-01 / 10.00
> * mike / 2010-03-02 / 15.00
> * john / 2010-03-02 /  5.00
> * mike / 2010-03-03 / 20.00
> * john / 2010-03-03 /  1.00
> * mike / 2010-03-04 / 10.00
> * john / 2010-03-04 / 10.00
> * mike / 2010-03-05 / 15.00
> * john / 2010-03-05 /  5.00
> * mike / 2010-03-06 / 20.00
> * john / 2010-03-06 /  1.00
>
> So our 2 affiliates mike and john made 6 sales each between 2010-03-01 and
> 2010-03-06. My ultimate query should return this for a query between
> 2010-03-01 and 2010-03-06 :
>
> *Affiliate / TotalSales*
> * mike / 90.00
> * john / 32.00
>
> So it's exactly like -
>
> SELECT Affiliate, sum(SaleAmount) as TotalSales FROM Sales
>  WHERE SaleDate >= '2010-03-01' AND SaleDate <= '2010-03-06'
>  GROUP BY Affiliate
>  ORDER BY TotalSales DESC;
>
> - Mike
> akaris@gmail.com
>
>
> On Thu, Apr 1, 2010 at 8:11 AM, prasenjit mukherjee <pr...@gmail.com>wrote:
>
>> Not sure what you mean by "joining" in lucene , since conceptually
>> there is only 1 table ( with many field aka columns ) in lucene. A
>> representative query would be good to know the use case.
>>
>> Again didn't get the "sorting" part.  SUM() will return only 1
>> aggregated value, so what do you want to sort it on ?
>>
>> -Prasen
>>
>> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <ak...@gmail.com> wrote:
>> > Are you planning to be able to sort by these SUMs? A SpanQuery would work
>> > great to get the integers... then you would loop and sum up... but what
>> > about "joining" with your other data and sorting?
>> >
>> > - Mike
>> > akaris@gmail.com
>> >
>> >
>> > On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
>> > <pr...@gmail.com>wrote:
>> >
>> >> I too am trying to achieve something.
>> >>
>> >> I am thinking of storing the integer values in  payloads and then
>> >> using spanquery classes to compute the respective SUMs
>> >>
>> >> -Prasen
>> >>
>> >> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > We're currently in the process of switching many of our screens from
>> >> MySQL
>> >> > to Lucene because MySQL simply dies because we have too much data and
>> >> it's
>> >> > becoming too long to generate the stats we need.
>> >> >
>> >> > So here's one MySQL query that we use to find out our Top 10
>> Affiliates :
>> >> >
>> >> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>> >> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
>> >> affialiate_id
>> >> > ORDER BY total_sales DESC LIMIT 10;
>> >> >
>> >> > We currently have our "sales" index, containing all sales and all
>> fields
>> >> -
>> >> > and it's one big index (over 10M records). We could fetch all
>> documents
>> >> > within the date range, loop them and add up the total_sales, but it
>> would
>> >> be
>> >> > just crazy to do this all the time (we have a high volume of search).
>> >> >
>> >> > We made several tests with Solr (Facets, and even the beta
>> >> CollapseFields),
>> >> > but nothing is really helping us. We could pre-generate the
>> total_sales
>> >> for
>> >> > all possible date ranges... but that would be quite crazy too as the
>> date
>> >> > range possibilities quickly become endless.
>> >> >
>> >> > So - is there any known way to efficiently do SUM(), COUNT() (and even
>> >> AVG()
>> >> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
>> >> seem
>> >> > to offer what I need either.
>> >> >
>> >> > Thanks for any hints!!!
>> >> >
>> >> > - Mike
>> >> > akaris@gmail.com
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Michel Nadeau <ak...@gmail.com>.
Hi,

Here's an example of raw data that would be in my Sales index:

*Affiliate / SaleDate / SaleAmount*
* mike / 2010-03-01 / 10.00
* john / 2010-03-01 / 10.00
* mike / 2010-03-02 / 15.00
* john / 2010-03-02 /  5.00
* mike / 2010-03-03 / 20.00
* john / 2010-03-03 /  1.00
* mike / 2010-03-04 / 10.00
* john / 2010-03-04 / 10.00
* mike / 2010-03-05 / 15.00
* john / 2010-03-05 /  5.00
* mike / 2010-03-06 / 20.00
* john / 2010-03-06 /  1.00

So our 2 affiliates mike and john made 6 sales each between 2010-03-01 and
2010-03-06. My ultimate query should return this for a query between
2010-03-01 and 2010-03-06 :

*Affiliate / TotalSales*
* mike / 90.00
* john / 32.00

So it's exactly like -

SELECT Affiliate, sum(SaleAmount) as TotalSales FROM Sales
  WHERE SaleDate >= '2010-03-01' AND SaleDate <= '2010-03-06'
  GROUP BY Affiliate
  ORDER BY TotalSales DESC;

- Mike
akaris@gmail.com


On Thu, Apr 1, 2010 at 8:11 AM, prasenjit mukherjee <pr...@gmail.com>wrote:

> Not sure what you mean by "joining" in lucene , since conceptually
> there is only 1 table ( with many field aka columns ) in lucene. A
> representative query would be good to know the use case.
>
> Again didn't get the "sorting" part.  SUM() will return only 1
> aggregated value, so what do you want to sort it on ?
>
> -Prasen
>
> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <ak...@gmail.com> wrote:
> > Are you planning to be able to sort by these SUMs? A SpanQuery would work
> > great to get the integers... then you would loop and sum up... but what
> > about "joining" with your other data and sorting?
> >
> > - Mike
> > akaris@gmail.com
> >
> >
> > On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
> > <pr...@gmail.com>wrote:
> >
> >> I too am trying to achieve something.
> >>
> >> I am thinking of storing the integer values in  payloads and then
> >> using spanquery classes to compute the respective SUMs
> >>
> >> -Prasen
> >>
> >> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > We're currently in the process of switching many of our screens from
> >> MySQL
> >> > to Lucene because MySQL simply dies because we have too much data and
> >> it's
> >> > becoming too long to generate the stats we need.
> >> >
> >> > So here's one MySQL query that we use to find out our Top 10
> Affiliates :
> >> >
> >> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> >> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
> >> affialiate_id
> >> > ORDER BY total_sales DESC LIMIT 10;
> >> >
> >> > We currently have our "sales" index, containing all sales and all
> fields
> >> -
> >> > and it's one big index (over 10M records). We could fetch all
> documents
> >> > within the date range, loop them and add up the total_sales, but it
> would
> >> be
> >> > just crazy to do this all the time (we have a high volume of search).
> >> >
> >> > We made several tests with Solr (Facets, and even the beta
> >> CollapseFields),
> >> > but nothing is really helping us. We could pre-generate the
> total_sales
> >> for
> >> > all possible date ranges... but that would be quite crazy too as the
> date
> >> > range possibilities quickly become endless.
> >> >
> >> > So - is there any known way to efficiently do SUM(), COUNT() (and even
> >> AVG()
> >> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
> >> seem
> >> > to offer what I need either.
> >> >
> >> > Thanks for any hints!!!
> >> >
> >> > - Mike
> >> > akaris@gmail.com
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene Challenge - sum, count, avg, etc.

Posted by prasenjit mukherjee <pr...@gmail.com>.
Not sure what you mean by "joining" in lucene , since conceptually
there is only 1 table ( with many field aka columns ) in lucene. A
representative query would be good to know the use case.

Again didn't get the "sorting" part.  SUM() will return only 1
aggregated value, so what do you want to sort it on ?

-Prasen

On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau <ak...@gmail.com> wrote:
> Are you planning to be able to sort by these SUMs? A SpanQuery would work
> great to get the integers... then you would loop and sum up... but what
> about "joining" with your other data and sorting?
>
> - Mike
> akaris@gmail.com
>
>
> On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
> <pr...@gmail.com>wrote:
>
>> I too am trying to achieve something.
>>
>> I am thinking of storing the integer values in  payloads and then
>> using spanquery classes to compute the respective SUMs
>>
>> -Prasen
>>
>> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
>> > Hi,
>> >
>> > We're currently in the process of switching many of our screens from
>> MySQL
>> > to Lucene because MySQL simply dies because we have too much data and
>> it's
>> > becoming too long to generate the stats we need.
>> >
>> > So here's one MySQL query that we use to find out our Top 10 Affiliates :
>> >
>> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
>> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
>> affialiate_id
>> > ORDER BY total_sales DESC LIMIT 10;
>> >
>> > We currently have our "sales" index, containing all sales and all fields
>> -
>> > and it's one big index (over 10M records). We could fetch all documents
>> > within the date range, loop them and add up the total_sales, but it would
>> be
>> > just crazy to do this all the time (we have a high volume of search).
>> >
>> > We made several tests with Solr (Facets, and even the beta
>> CollapseFields),
>> > but nothing is really helping us. We could pre-generate the total_sales
>> for
>> > all possible date ranges... but that would be quite crazy too as the date
>> > range possibilities quickly become endless.
>> >
>> > So - is there any known way to efficiently do SUM(), COUNT() (and even
>> AVG()
>> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
>> seem
>> > to offer what I need either.
>> >
>> > Thanks for any hints!!!
>> >
>> > - Mike
>> > akaris@gmail.com
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Michel Nadeau <ak...@gmail.com>.
Are you planning to be able to sort by these SUMs? A SpanQuery would work
great to get the integers... then you would loop and sum up... but what
about "joining" with your other data and sorting?

- Mike
akaris@gmail.com


On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
<pr...@gmail.com>wrote:

> I too am trying to achieve something.
>
> I am thinking of storing the integer values in  payloads and then
> using spanquery classes to compute the respective SUMs
>
> -Prasen
>
> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
> > Hi,
> >
> > We're currently in the process of switching many of our screens from
> MySQL
> > to Lucene because MySQL simply dies because we have too much data and
> it's
> > becoming too long to generate the stats we need.
> >
> > So here's one MySQL query that we use to find out our Top 10 Affiliates :
> >
> > SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> > sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY
> affialiate_id
> > ORDER BY total_sales DESC LIMIT 10;
> >
> > We currently have our "sales" index, containing all sales and all fields
> -
> > and it's one big index (over 10M records). We could fetch all documents
> > within the date range, loop them and add up the total_sales, but it would
> be
> > just crazy to do this all the time (we have a high volume of search).
> >
> > We made several tests with Solr (Facets, and even the beta
> CollapseFields),
> > but nothing is really helping us. We could pre-generate the total_sales
> for
> > all possible date ranges... but that would be quite crazy too as the date
> > range possibilities quickly become endless.
> >
> > So - is there any known way to efficiently do SUM(), COUNT() (and even
> AVG()
> > ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't
> seem
> > to offer what I need either.
> >
> > Thanks for any hints!!!
> >
> > - Mike
> > akaris@gmail.com
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene Challenge - sum, count, avg, etc.

Posted by prasenjit mukherjee <pr...@gmail.com>.
I too am trying to achieve something.

I am thinking of storing the integer values in  payloads and then
using spanquery classes to compute the respective SUMs

-Prasen

On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau <ak...@gmail.com> wrote:
> Hi,
>
> We're currently in the process of switching many of our screens from MySQL
> to Lucene because MySQL simply dies because we have too much data and it's
> becoming too long to generate the stats we need.
>
> So here's one MySQL query that we use to find out our Top 10 Affiliates :
>
> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY affialiate_id
> ORDER BY total_sales DESC LIMIT 10;
>
> We currently have our "sales" index, containing all sales and all fields -
> and it's one big index (over 10M records). We could fetch all documents
> within the date range, loop them and add up the total_sales, but it would be
> just crazy to do this all the time (we have a high volume of search).
>
> We made several tests with Solr (Facets, and even the beta CollapseFields),
> but nothing is really helping us. We could pre-generate the total_sales for
> all possible date ranges... but that would be quite crazy too as the date
> range possibilities quickly become endless.
>
> So - is there any known way to efficiently do SUM(), COUNT() (and even AVG()
> ) using Lucene/Solr/others? I also checked Bobo Browse but it doesn't seem
> to offer what I need either.
>
> Thanks for any hints!!!
>
> - Mike
> akaris@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query: order of search

Posted by Karl Wettin <ka...@gmail.com>.
1 apr 2010 kl. 11.21 skrev <su...@zapak.co.in> <suman.holani@zapak.co.in 
 >:

> its written "to do a "search within search", so that the second  
> search is
> constrained by the results of the first query"

If I understand your needs you could while collecting search results  
populate a new filter with all matching documents and use that filter  
during "search within search(results?)".


	karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query: order of search

Posted by Erick Erickson <er...@gmail.com>.
I'm pretty sure that order doesn't matter. Again, though, don't
worry about this level of trick until you can demonstrate
performance issues, your time is usually best spent in other
places....

Best
Erick

On Thu, Apr 1, 2010 at 11:54 PM, <su...@zapak.co.in> wrote:

> Hello Erick,
>
> I was trying to optimise the searching.
> Basically my data is like field1 has less no of docs matching compared to
> field2, which has larget sets.
>
> So if search goes by order to order, then i can make field1 to be search
> first, (by making  order of boolean query such  )and from thr the docs
> result set could be searched for field2.
> So number of docs to be matched will be less.
> I am not using filters or any caching since the search is unique everytime
> acc to parameter, and its using boolean query
>
>
> or is the searching takes place like searches each doc for  field1="abc"
> and field2="def" then get resultset
>
> I hope I am clear in explaining my query.
>
> thanks,
> Suman
>
>
>
> On Thu, 1 Apr 2010 08:42:16 -0700, Erick Erickson
> <er...@gmail.com>
> wrote:
>  > Why do you care? By that I mean "what problem are you trying to solve"
> > (See "The XY problem at http://people.apache.org/~hossman/). The reason
> > I'm asking here is that very often, when people ask this kind of
> question
> > without providing background, they're trying the wrong approach to solve
> > a problem.
> >
> > Unless you're asking for theoretical reasons, I almost guarantee that
> > trying
> > to solve any problem in Lucene by changing the order of the clauses
> isn't
> > going to do you much good in the long run, there'll *always* be variant
> > n+1...
> >
> > Best
> > Erick
> >
> > On Thu, Apr 1, 2010 at 2:21 AM, <su...@zapak.co.in> wrote:
> >
> >>
> >> Query I
> >>
> >> its written "to do a "search within search", so that the second search
> is
> >> constrained by the results of the first query"
> >> we can use boolean query.
> >> So doesn't it mean the order of query ll be preserved
> >>
> >> give me an simple example of how the docs get searched in lucene.
> >> 10 docs with 3 fields to b searched
> >> so ll it search for field1 for all 10 docs..and the result set docs ll
> b
> >> then searched for field2....and so far...
> >> or is it something else..
> >>
> >>
> >>
> >> thanks,
> >> Suman
> >>
> >> On Thu, 1 Apr 2010 10:05:01 +0100, Ian Lea <ia...@gmail.com> wrote:
> >> >> Query I
> >> >> Does the order of query play role in searching
> >> >> example:doc has fields
> >> >> rollno(pk), name, marks
> >> >>
> >> >> Query : marks=90&rollno=2&name=abc
> >> >>
> >> >> Query :rollno=2&name=abc&marks=90
> >> >>
> >> >> which query processing will be more efficient.
> >> >> is it work like search doc field by field , it will look for doc
> >> >> having
> >> >> rollno=2 then from thos will look for name abc and from the selected
> >> docs
> >> >> will again look for marks param .
> >> >
> >> > I don't think it makes any difference.
> >> >
> >> >
> >> >> Query II
> >> >>  which  is more faster .
> >> >> a)  adding one more multivalued field into current index(large)
> >> >> b)  or creating a separate index for the field and searching it as a
> >> >> subindex .
> >> >> My main apprehensions is in regard to large size of index
> >> >
> >> > Probably your first option.
> >> >
> >> >
> >> > --
> >> > Ian.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: query: order of search

Posted by su...@zapak.co.in.
Hello Erick,

I was trying to optimise the searching.
Basically my data is like field1 has less no of docs matching compared to
field2, which has larget sets.

So if search goes by order to order, then i can make field1 to be search
first, (by making  order of boolean query such  )and from thr the docs
result set could be searched for field2. 
So number of docs to be matched will be less.
I am not using filters or any caching since the search is unique everytime
acc to parameter, and its using boolean query


or is the searching takes place like searches each doc for  field1="abc"
and field2="def" then get resultset

I hope I am clear in explaining my query. 

thanks,
Suman



On Thu, 1 Apr 2010 08:42:16 -0700, Erick Erickson
<er...@gmail.com>
wrote:
> Why do you care? By that I mean "what problem are you trying to solve"
> (See "The XY problem at http://people.apache.org/~hossman/). The reason
> I'm asking here is that very often, when people ask this kind of
question
> without providing background, they're trying the wrong approach to solve
> a problem.
> 
> Unless you're asking for theoretical reasons, I almost guarantee that
> trying
> to solve any problem in Lucene by changing the order of the clauses
isn't
> going to do you much good in the long run, there'll *always* be variant
> n+1...
> 
> Best
> Erick
> 
> On Thu, Apr 1, 2010 at 2:21 AM, <su...@zapak.co.in> wrote:
> 
>>
>> Query I
>>
>> its written "to do a "search within search", so that the second search
is
>> constrained by the results of the first query"
>> we can use boolean query.
>> So doesn't it mean the order of query ll be preserved
>>
>> give me an simple example of how the docs get searched in lucene.
>> 10 docs with 3 fields to b searched
>> so ll it search for field1 for all 10 docs..and the result set docs ll
b
>> then searched for field2....and so far...
>> or is it something else..
>>
>>
>>
>> thanks,
>> Suman
>>
>> On Thu, 1 Apr 2010 10:05:01 +0100, Ian Lea <ia...@gmail.com> wrote:
>> >> Query I
>> >> Does the order of query play role in searching
>> >> example:doc has fields
>> >> rollno(pk), name, marks
>> >>
>> >> Query : marks=90&rollno=2&name=abc
>> >>
>> >> Query :rollno=2&name=abc&marks=90
>> >>
>> >> which query processing will be more efficient.
>> >> is it work like search doc field by field , it will look for doc
>> >> having
>> >> rollno=2 then from thos will look for name abc and from the selected
>> docs
>> >> will again look for marks param .
>> >
>> > I don't think it makes any difference.
>> >
>> >
>> >> Query II
>> >>  which  is more faster .
>> >> a)  adding one more multivalued field into current index(large)
>> >> b)  or creating a separate index for the field and searching it as a
>> >> subindex .
>> >> My main apprehensions is in regard to large size of index
>> >
>> > Probably your first option.
>> >
>> >
>> > --
>> > Ian.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query: order of search

Posted by Erick Erickson <er...@gmail.com>.
Why do you care? By that I mean "what problem are you trying to solve"
(See "The XY problem at http://people.apache.org/~hossman/). The reason
I'm asking here is that very often, when people ask this kind of question
without providing background, they're trying the wrong approach to solve
a problem.

Unless you're asking for theoretical reasons, I almost guarantee that trying
to solve any problem in Lucene by changing the order of the clauses isn't
going to do you much good in the long run, there'll *always* be variant
n+1...

Best
Erick

On Thu, Apr 1, 2010 at 2:21 AM, <su...@zapak.co.in> wrote:

>
> Query I
>
> its written "to do a "search within search", so that the second search is
> constrained by the results of the first query"
> we can use boolean query.
> So doesn't it mean the order of query ll be preserved
>
> give me an simple example of how the docs get searched in lucene.
> 10 docs with 3 fields to b searched
> so ll it search for field1 for all 10 docs..and the result set docs ll b
> then searched for field2....and so far...
> or is it something else..
>
>
>
> thanks,
> Suman
>
> On Thu, 1 Apr 2010 10:05:01 +0100, Ian Lea <ia...@gmail.com> wrote:
> >> Query I
> >> Does the order of query play role in searching
> >> example:doc has fields
> >> rollno(pk), name, marks
> >>
> >> Query : marks=90&rollno=2&name=abc
> >>
> >> Query :rollno=2&name=abc&marks=90
> >>
> >> which query processing will be more efficient.
> >> is it work like search doc field by field , it will look for doc having
> >> rollno=2 then from thos will look for name abc and from the selected
> docs
> >> will again look for marks param .
> >
> > I don't think it makes any difference.
> >
> >
> >> Query II
> >>  which  is more faster .
> >> a)  adding one more multivalued field into current index(large)
> >> b)  or creating a separate index for the field and searching it as a
> >> subindex .
> >> My main apprehensions is in regard to large size of index
> >
> > Probably your first option.
> >
> >
> > --
> > Ian.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: query: order of search

Posted by su...@zapak.co.in.
Query I

its written "to do a "search within search", so that the second search is
constrained by the results of the first query"
we can use boolean query.
So doesn't it mean the order of query ll be preserved

give me an simple example of how the docs get searched in lucene.
10 docs with 3 fields to b searched
so ll it search for field1 for all 10 docs..and the result set docs ll b
then searched for field2....and so far...
or is it something else..



thanks,
Suman

On Thu, 1 Apr 2010 10:05:01 +0100, Ian Lea <ia...@gmail.com> wrote:
>> Query I
>> Does the order of query play role in searching
>> example:doc has fields
>> rollno(pk), name, marks
>>
>> Query : marks=90&rollno=2&name=abc
>>
>> Query :rollno=2&name=abc&marks=90
>>
>> which query processing will be more efficient.
>> is it work like search doc field by field , it will look for doc having
>> rollno=2 then from thos will look for name abc and from the selected
docs
>> will again look for marks param .
> 
> I don't think it makes any difference.
> 
> 
>> Query II
>>  which  is more faster .
>> a)  adding one more multivalued field into current index(large)
>> b)  or creating a separate index for the field and searching it as a
>> subindex .
>> My main apprehensions is in regard to large size of index
> 
> Probably your first option.
> 
> 
> --
> Ian.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query: order of search

Posted by Ian Lea <ia...@gmail.com>.
> Query I
> Does the order of query play role in searching
> example:doc has fields
> rollno(pk), name, marks
>
> Query : marks=90&rollno=2&name=abc
>
> Query :rollno=2&name=abc&marks=90
>
> which query processing will be more efficient.
> is it work like search doc field by field , it will look for doc having
> rollno=2 then from thos will look for name abc and from the selected docs
> will again look for marks param .

I don't think it makes any difference.


> Query II
>  which  is more faster .
> a)  adding one more multivalued field into current index(large)
> b)  or creating a separate index for the field and searching it as a
> subindex .
> My main apprehensions is in regard to large size of index

Probably your first option.


--
Ian.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query: order of search

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: query: order of search
: In-Reply-To: <8D...@transpac.com>
: References: <j2...@mail.gmail.com>
:     <8D...@transpac.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


query: order of search

Posted by su...@zapak.co.in.
Hello


Query I
Does the order of query play role in searching 
example:doc has fields
rollno(pk), name, marks

Query : marks=90&rollno=2&name=abc

Query :rollno=2&name=abc&marks=90

which query processing will be more efficient.
is it work like search doc field by field , it will look for doc having
rollno=2 then from thos will look for name abc and from the selected docs
will again look for marks param .


Query II
 which  is more faster .
a)  adding one more multivalued field into current index(large) 
b)  or creating a separate index for the field and searching it as a
subindex .
My main apprehensions is in regard to large size of index




thanks,
Suman 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Challenge - sum, count, avg, etc.

Posted by Ken Krugler <kk...@transpac.com>.
Hi Mike,

I'm sure there are better options, but one thing you could do is per- 
compute totals for different date resolutions. Depending on the number  
of unique affiliate IDs, this might work.

E.g. pre-calculate sums by day & by week (and maybe by month) for each  
affiliate id, and then turn the query into a set of affiliate_id x  
date range queries. Something like:

affiliate_id:<value> and (day:59 or day:60 or day:61 or week:10 or  
week:11 or week:12 or day:86 or day:87...)

-- Ken


On Mar 31, 2010, at 6:17pm, Michel Nadeau wrote:

> Hi,
>
> We're currently in the process of switching many of our screens from  
> MySQL
> to Lucene because MySQL simply dies because we have too much data  
> and it's
> becoming too long to generate the stats we need.
>
> So here's one MySQL query that we use to find out our Top 10  
> Affiliates :
>
> SELECT SUM(sale_amount) AS total_sales, affialiate_id FROM sales WHERE
> sale_date>='2010-03-01' AND sale_date<='2010-03-31' GROUP BY  
> affialiate_id
> ORDER BY total_sales DESC LIMIT 10;
>
> We currently have our "sales" index, containing all sales and all  
> fields -
> and it's one big index (over 10M records). We could fetch all  
> documents
> within the date range, loop them and add up the total_sales, but it  
> would be
> just crazy to do this all the time (we have a high volume of search).
>
> We made several tests with Solr (Facets, and even the beta  
> CollapseFields),
> but nothing is really helping us. We could pre-generate the  
> total_sales for
> all possible date ranges... but that would be quite crazy too as the  
> date
> range possibilities quickly become endless.
>
> So - is there any known way to efficiently do SUM(), COUNT() (and  
> even AVG()
> ) using Lucene/Solr/others? I also checked Bobo Browse but it  
> doesn't seem
> to offer what I need either.
>
> Thanks for any hints!!!
>
> - Mike
> akaris@gmail.com

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org