You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Mihai Caraman <ca...@gmail.com> on 2011/09/21 18:02:30 UTC

FacetedSearch DrillDown

Hello gurus,
Cutting to the chase, I index this: CategoryPath(lvl1,lvl2,lvl3)

I want to group things as deep as lvl3.
Which should be more eficient:
*search for categoryPath(lvl1) to get lvl2 results:
search lvl2 number of times for categoryPath(lvl1,lvl2) to get lvl3 results*
?
or
*search drilldown categorPath(lvl1)* // i can't test this because it doesn't
give me the results i expect in the SimpleSearch example( maybe i didn't
understand them and they can't even be compared to what i need)

Thank you,
Mihai

Re: FacetedSearch DrillDown

Posted by Em <ma...@yahoo.de>.

Hi Mihai,

what about having an extra field per level?

doc1: [day:monday], [hour:11pm], [minute:22], [second:00], [year:2011],
[month:October], [calendar day:11]...

This way you do not need to hack and you can easily extend your format
if you want to add new dimensions in future.

I did not work with Pivot-Facets but I think this is what they were made
for. http://wiki.apache.org/solr/HierarchicalFaceting

However, if you want to drilldown the full path this will be a huge
performance-bottleneck.

If I were in your shoes, I would try to find a usefull balance for what
you want and what your users need.

If your users are searching for a special document with a special
keyphrase and they are able to specify year, month and day just by
clicking on it, wouldn't this be enough for 95% of all queries?
Why killing the overall performance for just 5% of your queries?

Think about whether it would be better in sense of performance and in
sense of usability, if you refine your results as soon as a user decides
that he/she needs to add a new date-detail to the query.

Hope this helps,
Em


Am 21.09.2011 21:44, schrieb Mihai Caraman:
> 2011/9/21 Shai Erera <se...@gmail.com>
> 
>> What do you mean "up to lvl3"?
>>
> "as *deep *as lvl3" :P
> In this example, let's look at these lvls as a tree(like n-ary tree) with
> root in a unique value at(the top) lvl 1
> 
> ..one with category [l1, l2, l3] and one with [l1, l2],
> 
> All documents have the same depth (of categories) so as:
>             lvl1       lvl2     lvl3
> doc1: monday, 1pm,  3min
> doc2: monday, 1pm,  4min
> 
> doc3: monday, 2pm,  3min
> 
> and you ask to count "l1", you will get2.
> 
>  i'm looking to get(with drilling all the way down) (with *:number* being
> the value=number of results):
> 
> monday:3{
>   -1pm:2{
>     -3min:1
>     -4min:1
>    }
>   -2pm:1{
>     -3min:1
>    }
> }
> 
> I've managed to do this with repetive searches
> I search for monday, get 1pm,2pm
>     Then I search for monday/1pm , get 3min,4min
>         And Then I search for monday/1pm/3min... and so forth for every
> branch in this *categoryTree*
> 
> The question being, is there a faster way?isn't DrillDown.query(...) meant
> for this?
> 
> Where can i find more documentation on this kind of search, I'm interested
> in occupied space and computing time, because I imagine it's not meant for
> huge depths or lots of categories.
> 
> Again thanks for the reply and I appreciate very much this fantastic
> feature!
> Mihai
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch DrillDown

Posted by Shai Erera <se...@gmail.com>.

Hi

The link to the facet javadocs was not added to the release artifacts by
mistake, however they do exist in this URL:
http://lucene.apache.org/java/3_4_0/api/contrib-facet/index.html.

I've already fixed it, so in the next release it should be ok.

Shai

On Thu, Sep 22, 2011 at 11:10 AM, Em <ma...@yahoo.de> wrote:

> Hi,
>
> I just saw that this is about Lucene, not Solr. So I am sorry for giving
> a Solr-advice on a Lucene-topic.
>
> Shai, I just found the Facet-Contribution's API via Google. Where are
> references to that API? I can not find them in Lucene's Wiki or at the
> Lucene-page.
>
> I'd like to read a little bit more about this contribution to compare it
> with existing approaches in Solr.
>
> Thanks!
> Em
>
> Am 22.09.2011 09:08, schrieb Shai Erera:
> > Hi Mihai,
> >
> > thanks for clarifying the question. The facet module supports that quite
> > easily actually. I've included a sample code with some description:
> >
> > (1) FacetSearchParams fsp = new FacetSearchParams();
> > (2) CountFacetRequest facetRequest = new CountFacetRequest(new
> > CategoryPath("monday"), 10);
> > (3) facetRequest.setDepth(3);
> > (4) fsp.addFacetRequest(facetRequest);
> > (5) FacetsCollector col = new FacetsCollector(fsp,
> > searcher.getIndexReader(), taxoReader);
> > (6) searcher.search(new MatchAllDocsQuery(), col);
> > (7) System.out.println(col.getFacetResults().get(0));
> >
> > Explanation:
> > (1) -- create FacetSearchParams with the default FacetIndexingParams.
> This
> > is the common case.
> > (2) -- Create CountFacetRequest, for the 'monday' node (which is the
> > top-level node in your example), and specify that the top-10 counted
> > categories should be returned.
> > (3) -- Specify depth=3, which means that the top-K (10 in this example)
> > should be computed among all nodes up to depth '3'.
> > (4) -- add the FacetRequest to the search params.
> > (5) -- Create the FacetsCollector
> > (6) -- Issue the search
> > (7) -- Print the result, in this case only one FacetResult exists because
> > only one dimension (FacetRequest) was asked.
> >
> > This prints the following:
> >
> > Request: monday nRes=10 nLbl=10
> > Num valid Descendants (up to specified depth): 5
> >     Facet Result Node with 5 sub result nodes.
> >     Name: monday
> >     Value: 3.0
> >     Residue: 0.0
> >
> >     Subresult #0
> >         Facet Result Node with 0 sub result nodes.
> >         Name: monday/1pm
> >         Value: 2.0
> >         Residue: 0.0
> >
> >     Subresult #1
> >         Facet Result Node with 0 sub result nodes.
> >         Name: monday/2pm/3min
> >         Value: 1.0
> >         Residue: 0.0
> >
> >     Subresult #2
> >         Facet Result Node with 0 sub result nodes.
> >         Name: monday/2pm
> >         Value: 1.0
> >         Residue: 0.0
> >
> >     Subresult #3
> >         Facet Result Node with 0 sub result nodes.
> >         Name: monday/1pm/4min
> >         Value: 1.0
> >         Residue: 0.0
> >
> >     Subresult #4
> >         Facet Result Node with 0 sub result nodes.
> >         Name: monday/1pm/3min
> >         Value: 1.0
> >         Residue: 0.0
> >
> > I believe that's what you were looking for?
> >
> > The DrillDown class provide helper utility methods for drilling-down on a
> > selected facet. I.e., if you return the user the above results, and he
> > clicks on "Monday/1pm", you want to constraint the search to this
> category
> > only. The DrillDown class helps you create a Query out of the user's
> > selection.
> >
> > We wrote a very extensive userguide which unfortunately didn't make it
> into
> > the release. I've attached its PDF version in this issue:
> > https://issues.apache.org/jira/browse/LUCENE-3261. I intend to make an
> HTML
> > version out of it, so that it will be included with future releases.
> > Apologies for the delay.
> >
> > Shai
> >
> > On Wed, Sep 21, 2011 at 10:44 PM, Mihai Caraman <caraman.mihai@gmail.com
> >wrote:
> >
> >> monday, 1pm,  3min
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FacetedSearch DrillDown

Posted by Em <ma...@yahoo.de>.

Hi,

I just saw that this is about Lucene, not Solr. So I am sorry for giving
a Solr-advice on a Lucene-topic.

Shai, I just found the Facet-Contribution's API via Google. Where are
references to that API? I can not find them in Lucene's Wiki or at the
Lucene-page.

I'd like to read a little bit more about this contribution to compare it
with existing approaches in Solr.

Thanks!
Em

Am 22.09.2011 09:08, schrieb Shai Erera:
> Hi Mihai,
> 
> thanks for clarifying the question. The facet module supports that quite
> easily actually. I've included a sample code with some description:
> 
> (1) FacetSearchParams fsp = new FacetSearchParams();
> (2) CountFacetRequest facetRequest = new CountFacetRequest(new
> CategoryPath("monday"), 10);
> (3) facetRequest.setDepth(3);
> (4) fsp.addFacetRequest(facetRequest);
> (5) FacetsCollector col = new FacetsCollector(fsp,
> searcher.getIndexReader(), taxoReader);
> (6) searcher.search(new MatchAllDocsQuery(), col);
> (7) System.out.println(col.getFacetResults().get(0));
> 
> Explanation:
> (1) -- create FacetSearchParams with the default FacetIndexingParams. This
> is the common case.
> (2) -- Create CountFacetRequest, for the 'monday' node (which is the
> top-level node in your example), and specify that the top-10 counted
> categories should be returned.
> (3) -- Specify depth=3, which means that the top-K (10 in this example)
> should be computed among all nodes up to depth '3'.
> (4) -- add the FacetRequest to the search params.
> (5) -- Create the FacetsCollector
> (6) -- Issue the search
> (7) -- Print the result, in this case only one FacetResult exists because
> only one dimension (FacetRequest) was asked.
> 
> This prints the following:
> 
> Request: monday nRes=10 nLbl=10
> Num valid Descendants (up to specified depth): 5
>     Facet Result Node with 5 sub result nodes.
>     Name: monday
>     Value: 3.0
>     Residue: 0.0
> 
>     Subresult #0
>         Facet Result Node with 0 sub result nodes.
>         Name: monday/1pm
>         Value: 2.0
>         Residue: 0.0
> 
>     Subresult #1
>         Facet Result Node with 0 sub result nodes.
>         Name: monday/2pm/3min
>         Value: 1.0
>         Residue: 0.0
> 
>     Subresult #2
>         Facet Result Node with 0 sub result nodes.
>         Name: monday/2pm
>         Value: 1.0
>         Residue: 0.0
> 
>     Subresult #3
>         Facet Result Node with 0 sub result nodes.
>         Name: monday/1pm/4min
>         Value: 1.0
>         Residue: 0.0
> 
>     Subresult #4
>         Facet Result Node with 0 sub result nodes.
>         Name: monday/1pm/3min
>         Value: 1.0
>         Residue: 0.0
> 
> I believe that's what you were looking for?
> 
> The DrillDown class provide helper utility methods for drilling-down on a
> selected facet. I.e., if you return the user the above results, and he
> clicks on "Monday/1pm", you want to constraint the search to this category
> only. The DrillDown class helps you create a Query out of the user's
> selection.
> 
> We wrote a very extensive userguide which unfortunately didn't make it into
> the release. I've attached its PDF version in this issue:
> https://issues.apache.org/jira/browse/LUCENE-3261. I intend to make an HTML
> version out of it, so that it will be included with future releases.
> Apologies for the delay.
> 
> Shai
> 
> On Wed, Sep 21, 2011 at 10:44 PM, Mihai Caraman <ca...@gmail.com>wrote:
> 
>> monday, 1pm,  3min
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch DrillDown

Posted by Shai Erera <se...@gmail.com>.

Hi Mihai,

thanks for clarifying the question. The facet module supports that quite
easily actually. I've included a sample code with some description:

(1) FacetSearchParams fsp = new FacetSearchParams();
(2) CountFacetRequest facetRequest = new CountFacetRequest(new
CategoryPath("monday"), 10);
(3) facetRequest.setDepth(3);
(4) fsp.addFacetRequest(facetRequest);
(5) FacetsCollector col = new FacetsCollector(fsp,
searcher.getIndexReader(), taxoReader);
(6) searcher.search(new MatchAllDocsQuery(), col);
(7) System.out.println(col.getFacetResults().get(0));

Explanation:
(1) -- create FacetSearchParams with the default FacetIndexingParams. This
is the common case.
(2) -- Create CountFacetRequest, for the 'monday' node (which is the
top-level node in your example), and specify that the top-10 counted
categories should be returned.
(3) -- Specify depth=3, which means that the top-K (10 in this example)
should be computed among all nodes up to depth '3'.
(4) -- add the FacetRequest to the search params.
(5) -- Create the FacetsCollector
(6) -- Issue the search
(7) -- Print the result, in this case only one FacetResult exists because
only one dimension (FacetRequest) was asked.

This prints the following:

Request: monday nRes=10 nLbl=10
Num valid Descendants (up to specified depth): 5
    Facet Result Node with 5 sub result nodes.
    Name: monday
    Value: 3.0
    Residue: 0.0

    Subresult #0
        Facet Result Node with 0 sub result nodes.
        Name: monday/1pm
        Value: 2.0
        Residue: 0.0

    Subresult #1
        Facet Result Node with 0 sub result nodes.
        Name: monday/2pm/3min
        Value: 1.0
        Residue: 0.0

    Subresult #2
        Facet Result Node with 0 sub result nodes.
        Name: monday/2pm
        Value: 1.0
        Residue: 0.0

    Subresult #3
        Facet Result Node with 0 sub result nodes.
        Name: monday/1pm/4min
        Value: 1.0
        Residue: 0.0

    Subresult #4
        Facet Result Node with 0 sub result nodes.
        Name: monday/1pm/3min
        Value: 1.0
        Residue: 0.0

I believe that's what you were looking for?

The DrillDown class provide helper utility methods for drilling-down on a
selected facet. I.e., if you return the user the above results, and he
clicks on "Monday/1pm", you want to constraint the search to this category
only. The DrillDown class helps you create a Query out of the user's
selection.

We wrote a very extensive userguide which unfortunately didn't make it into
the release. I've attached its PDF version in this issue:
https://issues.apache.org/jira/browse/LUCENE-3261. I intend to make an HTML
version out of it, so that it will be included with future releases.
Apologies for the delay.

Shai

On Wed, Sep 21, 2011 at 10:44 PM, Mihai Caraman <ca...@gmail.com>wrote:

> monday, 1pm,  3min

Re: FacetedSearch DrillDown

Posted by Mihai Caraman <ca...@gmail.com>.

2011/9/21 Shai Erera <se...@gmail.com>

> What do you mean "up to lvl3"?
>
"as *deep *as lvl3" :P
In this example, let's look at these lvls as a tree(like n-ary tree) with
root in a unique value at(the top) lvl 1

..one with category [l1, l2, l3] and one with [l1, l2],

All documents have the same depth (of categories) so as:
            lvl1       lvl2     lvl3
doc1: monday, 1pm,  3min
doc2: monday, 1pm,  4min

doc3: monday, 2pm,  3min

and you ask to count "l1", you will get2.

 i'm looking to get(with drilling all the way down) (with *:number* being
the value=number of results):

monday:3{
  -1pm:2{
    -3min:1
    -4min:1
   }
  -2pm:1{
    -3min:1
   }
}

I've managed to do this with repetive searches
I search for monday, get 1pm,2pm
    Then I search for monday/1pm , get 3min,4min
        And Then I search for monday/1pm/3min... and so forth for every
branch in this *categoryTree*

The question being, is there a faster way?isn't DrillDown.query(...) meant
for this?

Where can i find more documentation on this kind of search, I'm interested
in occupied space and computing time, because I imagine it's not meant for
huge depths or lots of categories.

Again thanks for the reply and I appreciate very much this fantastic
feature!
Mihai

Re: FacetedSearch DrillDown

Posted by Shai Erera <se...@gmail.com>.

Can you please clarify the question? What do you mean "up to lvl3"?

Let me try with an example: if you index two documents, one with category
[l1, l2, l3] and one with [l1, l2], and you ask to count "l1", you will get
2. If you ask to count [l1, l2, l3] you will get 1, as only one document is
associated with that node. If you index a third document with category [l1,
l2, l3, l4] and ask to count [l1, l2, l3], you will get 2. Does that answer
your question?

Perhaps I didn't understand your question, so if you have a concrete example
of few docs, query and expected results, I'll be able to provide a better
answer.

Shai

On Wed, Sep 21, 2011 at 7:02 PM, Mihai Caraman <ca...@gmail.com>wrote:

> Hello gurus,
> Cutting to the chase, I index this: CategoryPath(lvl1,lvl2,lvl3)
>
> I want to group things as deep as lvl3.
> Which should be more eficient:
> *search for categoryPath(lvl1) to get lvl2 results:
> search lvl2 number of times for categoryPath(lvl1,lvl2) to get lvl3
> results*
> ?
> or
> *search drilldown categorPath(lvl1)* // i can't test this because it
> doesn't
> give me the results i expect in the SimpleSearch example( maybe i didn't
> understand them and they can't even be compared to what i need)
>
> Thank you,
> Mihai
>