You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ophir Cohen <op...@liveperson.com> on 2011/02/01 08:59:18 UTC

Payloads API and support

Hi Guys,

I've been using Lucene for more than 5 years and it is a great tool - 
great job! Thanks for everything...


Lately I encountered the new payloads support and it looks its a great 
solution for my project.


*The problem:*

The use case is as follows:

I need to support a way to calculate statistics on web pages.

Each page has few metrics that comes with it (how many user saw it, what 
was the average time on page etc...).


The requirement is to support query such as:

How many users saw pages contains the tokens 'house' and 'white'.

Or

What was the average time on pages contains tokens 'horse' and 'pony'.


*First solution:*

Add pages to Lucene, index the words and store the metrics.

*The problem: performance.*

Not as regular search, I need to provide results for all matched 
documents and those I need to iterate on all results and load the 
document data.
This method take to much time.


*Better solution:*

Store the metrics as payloads and calculate the needed data without 
access to the storage - a huge performance boost.


The problem is (unless I miss something) that I can't get the payloads 
from anything except TermPositions and it isn't good enough as I want to 
use complex queries.

Is there is any other way to access it?

One option can be to get the payload with the document id in the collector.


Any ideas/comments/suggests?

-- 
Thanks in advance,
Ophir Cohen


Re: Payloads API and support

Posted by Ophir Cohen <op...@gmail.com>.
Hi Grant,
Thanks for the answer - it wasn't a question of patient just accidentally
sent the same message more than once...
Sorry for that.

Anyway,
I'm checking right now the option to hold the metrics in in-memory array
(for all docs) and retrieve the metrics for that array rather than from
Lucene.
It looks pretty the same as using the FieldCache - but I'll try it as well.
I'll let you know the results,
Thanks again,
Ophir


On Wed, Feb 2, 2011 at 6:07 PM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On Feb 1, 2011, at 2:59 AM, Ophir Cohen wrote:
>
> > Hi Guys,
> >
> > I've been using Lucene for more than 5 years and it is a great tool -
> great job! Thanks for everything...
>
> Thanks.
>
> Just so you know going forward, please be patient in expecting answers,
> especially for complex questions like this that involve fairly expert usages
> of Lucene.  From what I can tell, you have sent the same question 3 times in
> a matter of less than a day.  Sending more than once in a 2-3 day period is
> just going to make it less likely that you will get help, not more likely.
>
> Some suggestions inline below.
>
> >
> >
> > Lately I encountered the new payloads support and it looks its a great
> solution for my project.
> >
> >
> > *The problem:*
> >
> > The use case is as follows:
> >
> > I need to support a way to calculate statistics on web pages.
> >
> > Each page has few metrics that comes with it (how many user saw it, what
> was the average time on page etc...).
> >
> >
> > The requirement is to support query such as:
> >
> > How many users saw pages contains the tokens 'house' and 'white'.
> >
> > Or
> >
> > What was the average time on pages contains tokens 'horse' and 'pony'.
> >
> >
> > *First solution:*
> >
> > Add pages to Lucene, index the words and store the metrics.
> >
> > *The problem: performance.*
> >
> > Not as regular search, I need to provide results for all matched
> documents and those I need to iterate on all results and load the document
> data.
> > This method take to much time.
> >
> >
> > *Better solution:*
> >
> > Store the metrics as payloads and calculate the needed data without
> access to the storage - a huge performance boost.
> >
>
> I think the better solution is to use the first approach, but to use the
> FieldCache on your metrics instead of stored documents and combine that w/ a
> custom Collector.
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
 <http://www.google.com/search?q=accedentily%20>

Re: Payloads API and support

Posted by Grant Ingersoll <gs...@apache.org>.
On Feb 1, 2011, at 2:59 AM, Ophir Cohen wrote:

> Hi Guys,
> 
> I've been using Lucene for more than 5 years and it is a great tool - great job! Thanks for everything...

Thanks.

Just so you know going forward, please be patient in expecting answers, especially for complex questions like this that involve fairly expert usages of Lucene.  From what I can tell, you have sent the same question 3 times in a matter of less than a day.  Sending more than once in a 2-3 day period is just going to make it less likely that you will get help, not more likely.

Some suggestions inline below.

> 
> 
> Lately I encountered the new payloads support and it looks its a great solution for my project.
> 
> 
> *The problem:*
> 
> The use case is as follows:
> 
> I need to support a way to calculate statistics on web pages.
> 
> Each page has few metrics that comes with it (how many user saw it, what was the average time on page etc...).
> 
> 
> The requirement is to support query such as:
> 
> How many users saw pages contains the tokens 'house' and 'white'.
> 
> Or
> 
> What was the average time on pages contains tokens 'horse' and 'pony'.
> 
> 
> *First solution:*
> 
> Add pages to Lucene, index the words and store the metrics.
> 
> *The problem: performance.*
> 
> Not as regular search, I need to provide results for all matched documents and those I need to iterate on all results and load the document data.
> This method take to much time.
> 
> 
> *Better solution:*
> 
> Store the metrics as payloads and calculate the needed data without access to the storage - a huge performance boost.
> 

I think the better solution is to use the first approach, but to use the FieldCache on your metrics instead of stored documents and combine that w/ a custom Collector.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org