You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2013/01/24 18:22:32 UTC

Faceted search in OR

Hi all,

I'm introducing Lucene faceted search in our project and I need some
hints to achieve some functionalities:
- I want facet filtering in OR, how to?
  - obtain facets for the filtered results but also for the non filtered
one. i.e. I have facet A with values A/V1, A/V2, A/V3 and these values
are disjunct each other, than a document having field with value V1
can't have also value V2 and so on; I would like to let the user select
more of these facet values in OR; how can I accumulate all the facets
values also filtering by facet selection? Should it work in a way
similar to ComplementCountingAggregator?
  - Can I use DrillDown class to obtain the OR facet filtering or have I
to rewrite a similar class using the BooleanQuery in OR. It's not clear
to me by this comment in the API:
Wraps a given Query as a drill-down query over the given categories,
assuming all are required (e.g. AND). You can construct a query with
different modes (such as OR or AND of ORs) by creating a BooleanQuery
and call this method several times. Make sure to wrap the query in that
case by ConstantScoreQuery and set the boost to 0.0f, so that it doesn't
affect scoring. 


Do you have any examples doing this?

Regards

Nicola.





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted search in OR

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Michael,

I'm looking into implementing a solution.

On Fri, 2013-01-25 at 16:23 -0500, Michael McCandless wrote:
> On Fri, Jan 25, 2013 at 3:48 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> 
> > if you have experiences in this use case can you share solutions? What
> > is reusable from Lucene 4.x implementation?
> 
> Sorry, no experience doing drill sideways w/ Lucene facets ... just in
> a prior life (another search engine).
> 
> One conceptual way to get the counts is to do "hold one out" query for
> each dimension you need the sideways counts on.  EG if user searched
> for "cameras", then drilled down on "Manufacturer = Sony" and drilled
> down again on "FormFactor = SLR", you could do 3 queries:
> 
>   * cameras AND Manufacturer=Sony --> sideways counts for "FormFactor"
> 
>   * cameras AND FormFactor=SLR -> sideways counts for "Manufacturer"
> 
>   * cameras AND FormFactor=SLR AND Manufacturer=Sony -> query results
> 
> I believe that will work?  But it's sort of costly ... you could "save
> the facet counts from the last query" to save on one of these query
> executions.
I agree this will retrieve the facets but is too costly; Fn+1 queries
where Fn is the number of facets; supposing to have 4-6 facets to show
to the user it's really too much.

> 
> Conceptually, the query divides all documents into 3 sets: document
> matches (add to drill-down counts), document is a near miss (would
> match except for exactly one of drill-down constraints), document
> doesn't match.  The sideways counts amounts to tallying up the near
> miss hits against the dimension that was the near miss.
> 
> Like if you could run a query for cameras AND (Manufacturer=Sony OR
> FormFactor=SLR minShouldMatch=N-1 (1 in this case)), which would match
> the "matches" and the near misses, and then during collection
> determine whether it was a hit or a near miss (hmm this isn't so hard:
> use Scorer.getChildren()), and collect and/or tally up the appropriate
> counts (drill down or sideways), then you'd get the right counts I
> think?
Sorry I'm quite new to Lucene, I have some questions...

Are you supposing I can use only one query to obtain all the
informations? Am I dreaming? :-)

BooleanQuery.setMinimumNumberShouldMatch(int min) permit you to skip a
number of clauses? if yes how to ensure it's skipping a particular facet
clause?

Scorer.getChildren() return sub scores for a children, how are
hierarchically organized the scores in this class?

Do you think rewriting a Collector that discriminate from Scorer.score()
and Scorer.getChildren()...score() I can collect different set of result
needed for facet counting?

Sorry for all these questions, that are just to better understand
lucene.


Nicola.

> 
> I think this could be a reasonable way to do drill sideways!
> 
> > Reading some books I just noticed that the expected behaviour for an
> > user filtering by facets is:
> > - facet values in the same facet group (category in lucene) are added in
> > OR
> > - facet values from different facet groups are added in AND
> 
> Right.
> 
> > - another interesting aspect is how the selection affect the counting in
> > other facets
> 
> This is why you have to do the N-1 queries I think.
> 
> > Should be interesting to evaluate if some of these facet navigation
> > patterns can be implemented or supported with some utilities in Lucene
> > 4.0
> 
> I think drill sideways is possible!  Just not implemented yet ...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted search in OR

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Fri, Jan 25, 2013 at 3:48 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> if you have experiences in this use case can you share solutions? What
> is reusable from Lucene 4.x implementation?

Sorry, no experience doing drill sideways w/ Lucene facets ... just in
a prior life (another search engine).

One conceptual way to get the counts is to do "hold one out" query for
each dimension you need the sideways counts on.  EG if user searched
for "cameras", then drilled down on "Manufacturer = Sony" and drilled
down again on "FormFactor = SLR", you could do 3 queries:

  * cameras AND Manufacturer=Sony --> sideways counts for "FormFactor"

  * cameras AND FormFactor=SLR -> sideways counts for "Manufacturer"

  * cameras AND FormFactor=SLR AND Manufacturer=Sony -> query results

I believe that will work?  But it's sort of costly ... you could "save
the facet counts from the last query" to save on one of these query
executions.

Conceptually, the query divides all documents into 3 sets: document
matches (add to drill-down counts), document is a near miss (would
match except for exactly one of drill-down constraints), document
doesn't match.  The sideways counts amounts to tallying up the near
miss hits against the dimension that was the near miss.

Like if you could run a query for cameras AND (Manufacturer=Sony OR
FormFactor=SLR minShouldMatch=N-1 (1 in this case)), which would match
the "matches" and the near misses, and then during collection
determine whether it was a hit or a near miss (hmm this isn't so hard:
use Scorer.getChildren()), and collect and/or tally up the appropriate
counts (drill down or sideways), then you'd get the right counts I
think?

I think this could be a reasonable way to do drill sideways!

> Reading some books I just noticed that the expected behaviour for an
> user filtering by facets is:
> - facet values in the same facet group (category in lucene) are added in
> OR
> - facet values from different facet groups are added in AND

Right.

> - another interesting aspect is how the selection affect the counting in
> other facets

This is why you have to do the N-1 queries I think.

> Should be interesting to evaluate if some of these facet navigation
> patterns can be implemented or supported with some utilities in Lucene
> 4.0

I think drill sideways is possible!  Just not implemented yet ...

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted search in OR

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Mike,

if you have experiences in this use case can you share solutions? What
is reusable from Lucene 4.x implementation?
Reading some books I just noticed that the expected behaviour for an
user filtering by facets is:
- facet values in the same facet group (category in lucene) are added in
OR
- facet values from different facet groups are added in AND
- another interesting aspect is how the selection affect the counting in
other facets

Should be interesting to evaluate if some of these facet navigation
patterns can be implemented or supported with some utilities in Lucene
4.0


Nicola.



On Fri, 2013-01-25 at 08:37 -0500, Michael McCandless wrote:
> I think that was supposed to be A/1 and A/3 in the last sentence below?
> 
> But, anyway, I think the question (and it's a good one!) is how, after
> having drilled down on one of these, eg A/1, would you then still show
> the counts for the other A/N categories?
> 
> Ie the counts would show how many hits the user would see if they
> changed A/1 drilldown to A/N instead.
> 
> I call this "drill sideways"...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Fri, Jan 25, 2013 at 7:29 AM, Shai Erera <se...@gmail.com> wrote:
> > Ooops, I just realized that at some point java-user was removed from the CC
> > :).
> > Fixing that.
> >
> > Shai
> >
> >
> > On Fri, Jan 25, 2013 at 2:27 PM, Shai Erera <se...@gmail.com> wrote:
> >
> >> Hi Nicola,
> >>
> >> Indeed, if it's a URL with parameters, it's not a UI trick :). I think
> >> that you can do what you want with the package, but before I explain what I
> >> think you should do, I'd like to use a concrete example, to better
> >> understand:
> >>
> >> Suppose that you have facets A/1, A/2 ... A/6 associated with documents. A
> >> document is associated with exactly one "A" facet, but the same facet may
> >> be associated with many documents.
> >> You query for X and it matches some documents that are collectively
> >> associated with facets A/1, A/2, A/3 and A/4. So A/5 and A/6 are associated
> >> with documents that do not match your query.
> >> However, your FacetRequest sets its numResults (what we call top-K) to 2,
> >> so you only get back A/1 and A/3, since they have the highest counts.
> >>
> >> So what we have now are:
> >> * Facets A/1, A/3 returned to the user, since they belong to the result
> >> set and have the highest counts
> >> * Facets A/2, A/4 are not returned to the user, even though they belong to
> >> the result set, but did not make it to the top-K
> >> * Facets A/5, A/6 are not returned because they don't belong to the result
> >> set at all.
> >>
> >> If this makes sense to you, and is similar to the scenario that you have,
> >> which of these facets would u like to show in addition to A/1 and A/2?
> >>
> >> Shai
> >>
> >>
> >> On Fri, Jan 25, 2013 at 11:39 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> >>
> >>> Hi Shai,
> >>>
> >>> thanks, again you are helping me a lot introducing faceted search.
> >>>
> >>> I'm not sure it's a UI trick. Suppose you have a URL with query params
> >>> that lead you to:
> >>> - the electronic department
> >>> - query on "hi-fi"
> >>> - brand facet selection on "A"
> >>>
> >>> which trick should the UI use? As a trick I should immagine:
> >>> - don't filter on facet with lucene but do it in the UI (now is tricky
> >>> to do the facet counting without lucene)
> >>> - execute 2 query one filtered and one not; pick the selected facets
> >>> from the filtered query and the other from the non filtered one
> >>> (filtered = filtered by facet selection, we can argue here)
> >>>
> >>> Note also I have some services that should return the results together
> >>> the facets if needed.
> >>>
> >>>
> >>>
> >>> Nicola.
> >>>
> >>> On Thu, 2013-01-24 at 22:47 +0200, Shai Erera wrote:
> >>> > That's sounds more like a UI trick to me. When I do that, I don't
> >>> > modify the brand facet (in the UI). I.e., continue to display it, with
> >>> > the original counts and if the user now wants to filter by A + D, then
> >>> > your UI somehow allows that (maybe checkboxes). Of if the user wants
> >>> > to quickly switch from brand A to D, he can do so w/ a single click,
> >>> > without running the original query again.
> >>> >
> >>> >
> >>> > Shai
> >>> >
> >>> >
> >>> >
> >>> > On Thu, Jan 24, 2013 at 10:28 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> >>> >         Hi Shai,
> >>> >
> >>> >         the use case is simple. Suppose you want to buy an hi-fi on a
> >>> >         online
> >>> >         shop. Go in the website in the Electronic department and write
> >>> >         "hi-fi"
> >>> >         in the search box, the interface return you lots of results
> >>> >         and a facet
> >>> >         on brands (10 brands values).
> >>> >         You select brand A and the results are filtered accordingly;
> >>> >         suppose now
> >>> >         you want to filter adding to the results the brand D, you
> >>> >         can't because
> >>> >         the filtered results by A don't contain values D for the brand
> >>> >         facet.
> >>> >
> >>> >         Than how can I retrieve also the facets for the results not
> >>> >         filtered?
> >>> >         I think it's a common use case when you permit to the user to
> >>> >         filter in
> >>> >         OR by facets.
> >>> >
> >>> >
> >>> >         Nicola.
> >>> >
> >>> >         On Thu, 2013-01-24 at 19:36 +0200, Shai Erera wrote:
> >>> >         > Hi Nicola,
> >>> >         >
> >>> >         >
> >>> >         > Regarding the OR drill-down, yes you can construct your own
> >>> >         > BooleanQuery, passing Occur.SHOULD instead of MUST.
> >>> >         Currently
> >>> >         > DrillDown does not help you do that, so you can copy the
> >>> >         code from
> >>> >         > DrillDown.query and change SHOULD to MUST. I opened
> >>> >         LUCENE-4716 to add
> >>> >         > this support to DrillDown.
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >         > Not sure that I understand your second question. If you want
> >>> >         to
> >>> >         > retrieve counts for all descendants of A, then set your
> >>> >         > FR.setNumResults to Integer.MAX_VALUE. But note, it's going
> >>> >         to be
> >>> >         > costly, i.e. you'd get a FacetResultNode per child of A, so
> >>> >         depending
> >>> >         > how "wide" A is, this may have some impact on RAM
> >>> >         consumption.
> >>> >         >
> >>> >         > If that's not what you meant, could you please clarify?
> >>> >         >
> >>> >         >
> >>> >         > Shai
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >         > On Thu, Jan 24, 2013 at 7:22 PM, Nicola Buso
> >>> >         <nb...@ebi.ac.uk> wrote:
> >>> >         >         Hi all,
> >>> >         >
> >>> >         >         I'm introducing Lucene faceted search in our project
> >>> >         and I
> >>> >         >         need some
> >>> >         >         hints to achieve some functionalities:
> >>> >         >         - I want facet filtering in OR, how to?
> >>> >         >           - obtain facets for the filtered results but also
> >>> >         for the
> >>> >         >         non filtered
> >>> >         >         one. i.e. I have facet A with values A/V1, A/V2,
> >>> >         A/V3 and
> >>> >         >         these values
> >>> >         >         are disjunct each other, than a document having
> >>> >         field with
> >>> >         >         value V1
> >>> >         >         can't have also value V2 and so on; I would like to
> >>> >         let the
> >>> >         >         user select
> >>> >         >         more of these facet values in OR; how can I
> >>> >         accumulate all the
> >>> >         >         facets
> >>> >         >         values also filtering by facet selection? Should it
> >>> >         work in a
> >>> >         >         way
> >>> >         >         similar to ComplementCountingAggregator?
> >>> >         >           - Can I use DrillDown class to obtain the OR facet
> >>> >         filtering
> >>> >         >         or have I
> >>> >         >         to rewrite a similar class using the BooleanQuery in
> >>> >         OR. It's
> >>> >         >         not clear
> >>> >         >         to me by this comment in the API:
> >>> >         >         Wraps a given Query as a drill-down query over the
> >>> >         given
> >>> >         >         categories,
> >>> >         >         assuming all are required (e.g. AND). You can
> >>> >         construct a
> >>> >         >         query with
> >>> >         >         different modes (such as OR or AND of ORs) by
> >>> >         creating a
> >>> >         >         BooleanQuery
> >>> >         >         and call this method several times. Make sure to
> >>> >         wrap the
> >>> >         >         query in that
> >>> >         >         case by ConstantScoreQuery and set the boost to
> >>> >         0.0f, so that
> >>> >         >         it doesn't
> >>> >         >         affect scoring.
> >>> >         >
> >>> >         >
> >>> >         >         Do you have any examples doing this?
> >>> >         >
> >>> >         >         Regards
> >>> >         >
> >>> >         >         Nicola.
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >
> >>> ---------------------------------------------------------------------
> >>> >         >         To unsubscribe, e-mail:
> >>> >         >         java-user-unsubscribe@lucene.apache.org
> >>> >         >         For additional commands, e-mail:
> >>> >         >         java-user-help@lucene.apache.org
> >>> >         >
> >>> >         >
> >>> >         >
> >>> >
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted search in OR

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think that was supposed to be A/1 and A/3 in the last sentence below?

But, anyway, I think the question (and it's a good one!) is how, after
having drilled down on one of these, eg A/1, would you then still show
the counts for the other A/N categories?

Ie the counts would show how many hits the user would see if they
changed A/1 drilldown to A/N instead.

I call this "drill sideways"...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 25, 2013 at 7:29 AM, Shai Erera <se...@gmail.com> wrote:
> Ooops, I just realized that at some point java-user was removed from the CC
> :).
> Fixing that.
>
> Shai
>
>
> On Fri, Jan 25, 2013 at 2:27 PM, Shai Erera <se...@gmail.com> wrote:
>
>> Hi Nicola,
>>
>> Indeed, if it's a URL with parameters, it's not a UI trick :). I think
>> that you can do what you want with the package, but before I explain what I
>> think you should do, I'd like to use a concrete example, to better
>> understand:
>>
>> Suppose that you have facets A/1, A/2 ... A/6 associated with documents. A
>> document is associated with exactly one "A" facet, but the same facet may
>> be associated with many documents.
>> You query for X and it matches some documents that are collectively
>> associated with facets A/1, A/2, A/3 and A/4. So A/5 and A/6 are associated
>> with documents that do not match your query.
>> However, your FacetRequest sets its numResults (what we call top-K) to 2,
>> so you only get back A/1 and A/3, since they have the highest counts.
>>
>> So what we have now are:
>> * Facets A/1, A/3 returned to the user, since they belong to the result
>> set and have the highest counts
>> * Facets A/2, A/4 are not returned to the user, even though they belong to
>> the result set, but did not make it to the top-K
>> * Facets A/5, A/6 are not returned because they don't belong to the result
>> set at all.
>>
>> If this makes sense to you, and is similar to the scenario that you have,
>> which of these facets would u like to show in addition to A/1 and A/2?
>>
>> Shai
>>
>>
>> On Fri, Jan 25, 2013 at 11:39 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>>
>>> Hi Shai,
>>>
>>> thanks, again you are helping me a lot introducing faceted search.
>>>
>>> I'm not sure it's a UI trick. Suppose you have a URL with query params
>>> that lead you to:
>>> - the electronic department
>>> - query on "hi-fi"
>>> - brand facet selection on "A"
>>>
>>> which trick should the UI use? As a trick I should immagine:
>>> - don't filter on facet with lucene but do it in the UI (now is tricky
>>> to do the facet counting without lucene)
>>> - execute 2 query one filtered and one not; pick the selected facets
>>> from the filtered query and the other from the non filtered one
>>> (filtered = filtered by facet selection, we can argue here)
>>>
>>> Note also I have some services that should return the results together
>>> the facets if needed.
>>>
>>>
>>>
>>> Nicola.
>>>
>>> On Thu, 2013-01-24 at 22:47 +0200, Shai Erera wrote:
>>> > That's sounds more like a UI trick to me. When I do that, I don't
>>> > modify the brand facet (in the UI). I.e., continue to display it, with
>>> > the original counts and if the user now wants to filter by A + D, then
>>> > your UI somehow allows that (maybe checkboxes). Of if the user wants
>>> > to quickly switch from brand A to D, he can do so w/ a single click,
>>> > without running the original query again.
>>> >
>>> >
>>> > Shai
>>> >
>>> >
>>> >
>>> > On Thu, Jan 24, 2013 at 10:28 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>>> >         Hi Shai,
>>> >
>>> >         the use case is simple. Suppose you want to buy an hi-fi on a
>>> >         online
>>> >         shop. Go in the website in the Electronic department and write
>>> >         "hi-fi"
>>> >         in the search box, the interface return you lots of results
>>> >         and a facet
>>> >         on brands (10 brands values).
>>> >         You select brand A and the results are filtered accordingly;
>>> >         suppose now
>>> >         you want to filter adding to the results the brand D, you
>>> >         can't because
>>> >         the filtered results by A don't contain values D for the brand
>>> >         facet.
>>> >
>>> >         Than how can I retrieve also the facets for the results not
>>> >         filtered?
>>> >         I think it's a common use case when you permit to the user to
>>> >         filter in
>>> >         OR by facets.
>>> >
>>> >
>>> >         Nicola.
>>> >
>>> >         On Thu, 2013-01-24 at 19:36 +0200, Shai Erera wrote:
>>> >         > Hi Nicola,
>>> >         >
>>> >         >
>>> >         > Regarding the OR drill-down, yes you can construct your own
>>> >         > BooleanQuery, passing Occur.SHOULD instead of MUST.
>>> >         Currently
>>> >         > DrillDown does not help you do that, so you can copy the
>>> >         code from
>>> >         > DrillDown.query and change SHOULD to MUST. I opened
>>> >         LUCENE-4716 to add
>>> >         > this support to DrillDown.
>>> >         >
>>> >         >
>>> >         >
>>> >         > Not sure that I understand your second question. If you want
>>> >         to
>>> >         > retrieve counts for all descendants of A, then set your
>>> >         > FR.setNumResults to Integer.MAX_VALUE. But note, it's going
>>> >         to be
>>> >         > costly, i.e. you'd get a FacetResultNode per child of A, so
>>> >         depending
>>> >         > how "wide" A is, this may have some impact on RAM
>>> >         consumption.
>>> >         >
>>> >         > If that's not what you meant, could you please clarify?
>>> >         >
>>> >         >
>>> >         > Shai
>>> >         >
>>> >         >
>>> >         >
>>> >         > On Thu, Jan 24, 2013 at 7:22 PM, Nicola Buso
>>> >         <nb...@ebi.ac.uk> wrote:
>>> >         >         Hi all,
>>> >         >
>>> >         >         I'm introducing Lucene faceted search in our project
>>> >         and I
>>> >         >         need some
>>> >         >         hints to achieve some functionalities:
>>> >         >         - I want facet filtering in OR, how to?
>>> >         >           - obtain facets for the filtered results but also
>>> >         for the
>>> >         >         non filtered
>>> >         >         one. i.e. I have facet A with values A/V1, A/V2,
>>> >         A/V3 and
>>> >         >         these values
>>> >         >         are disjunct each other, than a document having
>>> >         field with
>>> >         >         value V1
>>> >         >         can't have also value V2 and so on; I would like to
>>> >         let the
>>> >         >         user select
>>> >         >         more of these facet values in OR; how can I
>>> >         accumulate all the
>>> >         >         facets
>>> >         >         values also filtering by facet selection? Should it
>>> >         work in a
>>> >         >         way
>>> >         >         similar to ComplementCountingAggregator?
>>> >         >           - Can I use DrillDown class to obtain the OR facet
>>> >         filtering
>>> >         >         or have I
>>> >         >         to rewrite a similar class using the BooleanQuery in
>>> >         OR. It's
>>> >         >         not clear
>>> >         >         to me by this comment in the API:
>>> >         >         Wraps a given Query as a drill-down query over the
>>> >         given
>>> >         >         categories,
>>> >         >         assuming all are required (e.g. AND). You can
>>> >         construct a
>>> >         >         query with
>>> >         >         different modes (such as OR or AND of ORs) by
>>> >         creating a
>>> >         >         BooleanQuery
>>> >         >         and call this method several times. Make sure to
>>> >         wrap the
>>> >         >         query in that
>>> >         >         case by ConstantScoreQuery and set the boost to
>>> >         0.0f, so that
>>> >         >         it doesn't
>>> >         >         affect scoring.
>>> >         >
>>> >         >
>>> >         >         Do you have any examples doing this?
>>> >         >
>>> >         >         Regards
>>> >         >
>>> >         >         Nicola.
>>> >         >
>>> >         >
>>> >         >
>>> >         >
>>> >         >
>>> >         >
>>> >
>>> ---------------------------------------------------------------------
>>> >         >         To unsubscribe, e-mail:
>>> >         >         java-user-unsubscribe@lucene.apache.org
>>> >         >         For additional commands, e-mail:
>>> >         >         java-user-help@lucene.apache.org
>>> >         >
>>> >         >
>>> >         >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted search in OR

Posted by Shai Erera <se...@gmail.com>.
Ooops, I just realized that at some point java-user was removed from the CC
:).
Fixing that.

Shai


On Fri, Jan 25, 2013 at 2:27 PM, Shai Erera <se...@gmail.com> wrote:

> Hi Nicola,
>
> Indeed, if it's a URL with parameters, it's not a UI trick :). I think
> that you can do what you want with the package, but before I explain what I
> think you should do, I'd like to use a concrete example, to better
> understand:
>
> Suppose that you have facets A/1, A/2 ... A/6 associated with documents. A
> document is associated with exactly one "A" facet, but the same facet may
> be associated with many documents.
> You query for X and it matches some documents that are collectively
> associated with facets A/1, A/2, A/3 and A/4. So A/5 and A/6 are associated
> with documents that do not match your query.
> However, your FacetRequest sets its numResults (what we call top-K) to 2,
> so you only get back A/1 and A/3, since they have the highest counts.
>
> So what we have now are:
> * Facets A/1, A/3 returned to the user, since they belong to the result
> set and have the highest counts
> * Facets A/2, A/4 are not returned to the user, even though they belong to
> the result set, but did not make it to the top-K
> * Facets A/5, A/6 are not returned because they don't belong to the result
> set at all.
>
> If this makes sense to you, and is similar to the scenario that you have,
> which of these facets would u like to show in addition to A/1 and A/2?
>
> Shai
>
>
> On Fri, Jan 25, 2013 at 11:39 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>
>> Hi Shai,
>>
>> thanks, again you are helping me a lot introducing faceted search.
>>
>> I'm not sure it's a UI trick. Suppose you have a URL with query params
>> that lead you to:
>> - the electronic department
>> - query on "hi-fi"
>> - brand facet selection on "A"
>>
>> which trick should the UI use? As a trick I should immagine:
>> - don't filter on facet with lucene but do it in the UI (now is tricky
>> to do the facet counting without lucene)
>> - execute 2 query one filtered and one not; pick the selected facets
>> from the filtered query and the other from the non filtered one
>> (filtered = filtered by facet selection, we can argue here)
>>
>> Note also I have some services that should return the results together
>> the facets if needed.
>>
>>
>>
>> Nicola.
>>
>> On Thu, 2013-01-24 at 22:47 +0200, Shai Erera wrote:
>> > That's sounds more like a UI trick to me. When I do that, I don't
>> > modify the brand facet (in the UI). I.e., continue to display it, with
>> > the original counts and if the user now wants to filter by A + D, then
>> > your UI somehow allows that (maybe checkboxes). Of if the user wants
>> > to quickly switch from brand A to D, he can do so w/ a single click,
>> > without running the original query again.
>> >
>> >
>> > Shai
>> >
>> >
>> >
>> > On Thu, Jan 24, 2013 at 10:28 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>> >         Hi Shai,
>> >
>> >         the use case is simple. Suppose you want to buy an hi-fi on a
>> >         online
>> >         shop. Go in the website in the Electronic department and write
>> >         "hi-fi"
>> >         in the search box, the interface return you lots of results
>> >         and a facet
>> >         on brands (10 brands values).
>> >         You select brand A and the results are filtered accordingly;
>> >         suppose now
>> >         you want to filter adding to the results the brand D, you
>> >         can't because
>> >         the filtered results by A don't contain values D for the brand
>> >         facet.
>> >
>> >         Than how can I retrieve also the facets for the results not
>> >         filtered?
>> >         I think it's a common use case when you permit to the user to
>> >         filter in
>> >         OR by facets.
>> >
>> >
>> >         Nicola.
>> >
>> >         On Thu, 2013-01-24 at 19:36 +0200, Shai Erera wrote:
>> >         > Hi Nicola,
>> >         >
>> >         >
>> >         > Regarding the OR drill-down, yes you can construct your own
>> >         > BooleanQuery, passing Occur.SHOULD instead of MUST.
>> >         Currently
>> >         > DrillDown does not help you do that, so you can copy the
>> >         code from
>> >         > DrillDown.query and change SHOULD to MUST. I opened
>> >         LUCENE-4716 to add
>> >         > this support to DrillDown.
>> >         >
>> >         >
>> >         >
>> >         > Not sure that I understand your second question. If you want
>> >         to
>> >         > retrieve counts for all descendants of A, then set your
>> >         > FR.setNumResults to Integer.MAX_VALUE. But note, it's going
>> >         to be
>> >         > costly, i.e. you'd get a FacetResultNode per child of A, so
>> >         depending
>> >         > how "wide" A is, this may have some impact on RAM
>> >         consumption.
>> >         >
>> >         > If that's not what you meant, could you please clarify?
>> >         >
>> >         >
>> >         > Shai
>> >         >
>> >         >
>> >         >
>> >         > On Thu, Jan 24, 2013 at 7:22 PM, Nicola Buso
>> >         <nb...@ebi.ac.uk> wrote:
>> >         >         Hi all,
>> >         >
>> >         >         I'm introducing Lucene faceted search in our project
>> >         and I
>> >         >         need some
>> >         >         hints to achieve some functionalities:
>> >         >         - I want facet filtering in OR, how to?
>> >         >           - obtain facets for the filtered results but also
>> >         for the
>> >         >         non filtered
>> >         >         one. i.e. I have facet A with values A/V1, A/V2,
>> >         A/V3 and
>> >         >         these values
>> >         >         are disjunct each other, than a document having
>> >         field with
>> >         >         value V1
>> >         >         can't have also value V2 and so on; I would like to
>> >         let the
>> >         >         user select
>> >         >         more of these facet values in OR; how can I
>> >         accumulate all the
>> >         >         facets
>> >         >         values also filtering by facet selection? Should it
>> >         work in a
>> >         >         way
>> >         >         similar to ComplementCountingAggregator?
>> >         >           - Can I use DrillDown class to obtain the OR facet
>> >         filtering
>> >         >         or have I
>> >         >         to rewrite a similar class using the BooleanQuery in
>> >         OR. It's
>> >         >         not clear
>> >         >         to me by this comment in the API:
>> >         >         Wraps a given Query as a drill-down query over the
>> >         given
>> >         >         categories,
>> >         >         assuming all are required (e.g. AND). You can
>> >         construct a
>> >         >         query with
>> >         >         different modes (such as OR or AND of ORs) by
>> >         creating a
>> >         >         BooleanQuery
>> >         >         and call this method several times. Make sure to
>> >         wrap the
>> >         >         query in that
>> >         >         case by ConstantScoreQuery and set the boost to
>> >         0.0f, so that
>> >         >         it doesn't
>> >         >         affect scoring.
>> >         >
>> >         >
>> >         >         Do you have any examples doing this?
>> >         >
>> >         >         Regards
>> >         >
>> >         >         Nicola.
>> >         >
>> >         >
>> >         >
>> >         >
>> >         >
>> >         >
>> >
>> ---------------------------------------------------------------------
>> >         >         To unsubscribe, e-mail:
>> >         >         java-user-unsubscribe@lucene.apache.org
>> >         >         For additional commands, e-mail:
>> >         >         java-user-help@lucene.apache.org
>> >         >
>> >         >
>> >         >
>> >
>> >
>> >
>> >
>>
>>
>>
>

Re: Faceted search in OR

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Shai,

the use case is simple. Suppose you want to buy an hi-fi on a online
shop. Go in the website in the Electronic department and write "hi-fi"
in the search box, the interface return you lots of results and a facet
on brands (10 brands values).
You select brand A and the results are filtered accordingly; suppose now
you want to filter adding to the results the brand D, you can't because
the filtered results by A don't contain values D for the brand facet.

Than how can I retrieve also the facets for the results not filtered?
I think it's a common use case when you permit to the user to filter in
OR by facets.


Nicola.

On Thu, 2013-01-24 at 19:36 +0200, Shai Erera wrote:
> Hi Nicola,
> 
> 
> Regarding the OR drill-down, yes you can construct your own
> BooleanQuery, passing Occur.SHOULD instead of MUST. Currently
> DrillDown does not help you do that, so you can copy the code from
> DrillDown.query and change SHOULD to MUST. I opened LUCENE-4716 to add
> this support to DrillDown.
> 
> 
> 
> Not sure that I understand your second question. If you want to
> retrieve counts for all descendants of A, then set your
> FR.setNumResults to Integer.MAX_VALUE. But note, it's going to be
> costly, i.e. you'd get a FacetResultNode per child of A, so depending
> how "wide" A is, this may have some impact on RAM consumption.
> 
> If that's not what you meant, could you please clarify?
> 
> 
> Shai
> 
> 
> 
> On Thu, Jan 24, 2013 at 7:22 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Hi all,
>         
>         I'm introducing Lucene faceted search in our project and I
>         need some
>         hints to achieve some functionalities:
>         - I want facet filtering in OR, how to?
>           - obtain facets for the filtered results but also for the
>         non filtered
>         one. i.e. I have facet A with values A/V1, A/V2, A/V3 and
>         these values
>         are disjunct each other, than a document having field with
>         value V1
>         can't have also value V2 and so on; I would like to let the
>         user select
>         more of these facet values in OR; how can I accumulate all the
>         facets
>         values also filtering by facet selection? Should it work in a
>         way
>         similar to ComplementCountingAggregator?
>           - Can I use DrillDown class to obtain the OR facet filtering
>         or have I
>         to rewrite a similar class using the BooleanQuery in OR. It's
>         not clear
>         to me by this comment in the API:
>         Wraps a given Query as a drill-down query over the given
>         categories,
>         assuming all are required (e.g. AND). You can construct a
>         query with
>         different modes (such as OR or AND of ORs) by creating a
>         BooleanQuery
>         and call this method several times. Make sure to wrap the
>         query in that
>         case by ConstantScoreQuery and set the boost to 0.0f, so that
>         it doesn't
>         affect scoring.
>         
>         
>         Do you have any examples doing this?
>         
>         Regards
>         
>         Nicola.
>         
>         
>         
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted search in OR

Posted by Shai Erera <se...@gmail.com>.
Hi Nicola,

Regarding the OR drill-down, yes you can construct your own BooleanQuery,
passing Occur.SHOULD instead of MUST. Currently DrillDown does not help you
do that, so you can copy the code from DrillDown.query and change SHOULD to
MUST. I opened LUCENE-4716 to add this support to DrillDown.

Not sure that I understand your second question. If you want to retrieve
counts for all descendants of A, then set your FR.setNumResults to
Integer.MAX_VALUE. But note, it's going to be costly, i.e. you'd get a
FacetResultNode per child of A, so depending how "wide" A is, this may have
some impact on RAM consumption.

If that's not what you meant, could you please clarify?

Shai


On Thu, Jan 24, 2013 at 7:22 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi all,
>
> I'm introducing Lucene faceted search in our project and I need some
> hints to achieve some functionalities:
> - I want facet filtering in OR, how to?
>   - obtain facets for the filtered results but also for the non filtered
> one. i.e. I have facet A with values A/V1, A/V2, A/V3 and these values
> are disjunct each other, than a document having field with value V1
> can't have also value V2 and so on; I would like to let the user select
> more of these facet values in OR; how can I accumulate all the facets
> values also filtering by facet selection? Should it work in a way
> similar to ComplementCountingAggregator?
>   - Can I use DrillDown class to obtain the OR facet filtering or have I
> to rewrite a similar class using the BooleanQuery in OR. It's not clear
> to me by this comment in the API:
> Wraps a given Query as a drill-down query over the given categories,
> assuming all are required (e.g. AND). You can construct a query with
> different modes (such as OR or AND of ORs) by creating a BooleanQuery
> and call this method several times. Make sure to wrap the query in that
> case by ConstantScoreQuery and set the boost to 0.0f, so that it doesn't
> affect scoring.
>
>
> Do you have any examples doing this?
>
> Regards
>
> Nicola.
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>