You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mihai Caraman <ca...@gmail.com> on 2011/09/21 18:02:30 UTC
FacetedSearch DrillDown
Hello gurus,
Cutting to the chase, I index this: CategoryPath(lvl1,lvl2,lvl3)
I want to group things as deep as lvl3.
Which should be more eficient:
*search for categoryPath(lvl1) to get lvl2 results:
search lvl2 number of times for categoryPath(lvl1,lvl2) to get lvl3 results*
?
or
*search drilldown categorPath(lvl1)* // i can't test this because it doesn't
give me the results i expect in the SimpleSearch example( maybe i didn't
understand them and they can't even be compared to what i need)
Thank you,
Mihai
Re: FacetedSearch DrillDown
Posted by Em <ma...@yahoo.de>.
Hi Mihai,
what about having an extra field per level?
doc1: [day:monday], [hour:11pm], [minute:22], [second:00], [year:2011],
[month:October], [calendar day:11]...
This way you do not need to hack and you can easily extend your format
if you want to add new dimensions in future.
I did not work with Pivot-Facets but I think this is what they were made
for. http://wiki.apache.org/solr/HierarchicalFaceting
However, if you want to drilldown the full path this will be a huge
performance-bottleneck.
If I were in your shoes, I would try to find a usefull balance for what
you want and what your users need.
If your users are searching for a special document with a special
keyphrase and they are able to specify year, month and day just by
clicking on it, wouldn't this be enough for 95% of all queries?
Why killing the overall performance for just 5% of your queries?
Think about whether it would be better in sense of performance and in
sense of usability, if you refine your results as soon as a user decides
that he/she needs to add a new date-detail to the query.
Hope this helps,
Em
Am 21.09.2011 21:44, schrieb Mihai Caraman:
> 2011/9/21 Shai Erera <se...@gmail.com>
>
>> What do you mean "up to lvl3"?
>>
> "as *deep *as lvl3" :P
> In this example, let's look at these lvls as a tree(like n-ary tree) with
> root in a unique value at(the top) lvl 1
>
> ..one with category [l1, l2, l3] and one with [l1, l2],
>
> All documents have the same depth (of categories) so as:
> lvl1 lvl2 lvl3
> doc1: monday, 1pm, 3min
> doc2: monday, 1pm, 4min
>
> doc3: monday, 2pm, 3min
>
> and you ask to count "l1", you will get2.
>
> i'm looking to get(with drilling all the way down) (with *:number* being
> the value=number of results):
>
> monday:3{
> -1pm:2{
> -3min:1
> -4min:1
> }
> -2pm:1{
> -3min:1
> }
> }
>
> I've managed to do this with repetive searches
> I search for monday, get 1pm,2pm
> Then I search for monday/1pm , get 3min,4min
> And Then I search for monday/1pm/3min... and so forth for every
> branch in this *categoryTree*
>
> The question being, is there a faster way?isn't DrillDown.query(...) meant
> for this?
>
> Where can i find more documentation on this kind of search, I'm interested
> in occupied space and computing time, because I imagine it's not meant for
> huge depths or lots of categories.
>
> Again thanks for the reply and I appreciate very much this fantastic
> feature!
> Mihai
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: FacetedSearch DrillDown
Posted by Shai Erera <se...@gmail.com>.
Hi
The link to the facet javadocs was not added to the release artifacts by
mistake, however they do exist in this URL:
http://lucene.apache.org/java/3_4_0/api/contrib-facet/index.html.
I've already fixed it, so in the next release it should be ok.
Shai
On Thu, Sep 22, 2011 at 11:10 AM, Em <ma...@yahoo.de> wrote:
> Hi,
>
> I just saw that this is about Lucene, not Solr. So I am sorry for giving
> a Solr-advice on a Lucene-topic.
>
> Shai, I just found the Facet-Contribution's API via Google. Where are
> references to that API? I can not find them in Lucene's Wiki or at the
> Lucene-page.
>
> I'd like to read a little bit more about this contribution to compare it
> with existing approaches in Solr.
>
> Thanks!
> Em
>
> Am 22.09.2011 09:08, schrieb Shai Erera:
> > Hi Mihai,
> >
> > thanks for clarifying the question. The facet module supports that quite
> > easily actually. I've included a sample code with some description:
> >
> > (1) FacetSearchParams fsp = new FacetSearchParams();
> > (2) CountFacetRequest facetRequest = new CountFacetRequest(new
> > CategoryPath("monday"), 10);
> > (3) facetRequest.setDepth(3);
> > (4) fsp.addFacetRequest(facetRequest);
> > (5) FacetsCollector col = new FacetsCollector(fsp,
> > searcher.getIndexReader(), taxoReader);
> > (6) searcher.search(new MatchAllDocsQuery(), col);
> > (7) System.out.println(col.getFacetResults().get(0));
> >
> > Explanation:
> > (1) -- create FacetSearchParams with the default FacetIndexingParams.
> This
> > is the common case.
> > (2) -- Create CountFacetRequest, for the 'monday' node (which is the
> > top-level node in your example), and specify that the top-10 counted
> > categories should be returned.
> > (3) -- Specify depth=3, which means that the top-K (10 in this example)
> > should be computed among all nodes up to depth '3'.
> > (4) -- add the FacetRequest to the search params.
> > (5) -- Create the FacetsCollector
> > (6) -- Issue the search
> > (7) -- Print the result, in this case only one FacetResult exists because
> > only one dimension (FacetRequest) was asked.
> >
> > This prints the following:
> >
> > Request: monday nRes=10 nLbl=10
> > Num valid Descendants (up to specified depth): 5
> > Facet Result Node with 5 sub result nodes.
> > Name: monday
> > Value: 3.0
> > Residue: 0.0
> >
> > Subresult #0
> > Facet Result Node with 0 sub result nodes.
> > Name: monday/1pm
> > Value: 2.0
> > Residue: 0.0
> >
> > Subresult #1
> > Facet Result Node with 0 sub result nodes.
> > Name: monday/2pm/3min
> > Value: 1.0
> > Residue: 0.0
> >
> > Subresult #2
> > Facet Result Node with 0 sub result nodes.
> > Name: monday/2pm
> > Value: 1.0
> > Residue: 0.0
> >
> > Subresult #3
> > Facet Result Node with 0 sub result nodes.
> > Name: monday/1pm/4min
> > Value: 1.0
> > Residue: 0.0
> >
> > Subresult #4
> > Facet Result Node with 0 sub result nodes.
> > Name: monday/1pm/3min
> > Value: 1.0
> > Residue: 0.0
> >
> > I believe that's what you were looking for?
> >
> > The DrillDown class provide helper utility methods for drilling-down on a
> > selected facet. I.e., if you return the user the above results, and he
> > clicks on "Monday/1pm", you want to constraint the search to this
> category
> > only. The DrillDown class helps you create a Query out of the user's
> > selection.
> >
> > We wrote a very extensive userguide which unfortunately didn't make it
> into
> > the release. I've attached its PDF version in this issue:
> > https://issues.apache.org/jira/browse/LUCENE-3261. I intend to make an
> HTML
> > version out of it, so that it will be included with future releases.
> > Apologies for the delay.
> >
> > Shai
> >
> > On Wed, Sep 21, 2011 at 10:44 PM, Mihai Caraman <caraman.mihai@gmail.com
> >wrote:
> >
> >> monday, 1pm, 3min
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: FacetedSearch DrillDown
Posted by Em <ma...@yahoo.de>.
Hi,
I just saw that this is about Lucene, not Solr. So I am sorry for giving
a Solr-advice on a Lucene-topic.
Shai, I just found the Facet-Contribution's API via Google. Where are
references to that API? I can not find them in Lucene's Wiki or at the
Lucene-page.
I'd like to read a little bit more about this contribution to compare it
with existing approaches in Solr.
Thanks!
Em
Am 22.09.2011 09:08, schrieb Shai Erera:
> Hi Mihai,
>
> thanks for clarifying the question. The facet module supports that quite
> easily actually. I've included a sample code with some description:
>
> (1) FacetSearchParams fsp = new FacetSearchParams();
> (2) CountFacetRequest facetRequest = new CountFacetRequest(new
> CategoryPath("monday"), 10);
> (3) facetRequest.setDepth(3);
> (4) fsp.addFacetRequest(facetRequest);
> (5) FacetsCollector col = new FacetsCollector(fsp,
> searcher.getIndexReader(), taxoReader);
> (6) searcher.search(new MatchAllDocsQuery(), col);
> (7) System.out.println(col.getFacetResults().get(0));
>
> Explanation:
> (1) -- create FacetSearchParams with the default FacetIndexingParams. This
> is the common case.
> (2) -- Create CountFacetRequest, for the 'monday' node (which is the
> top-level node in your example), and specify that the top-10 counted
> categories should be returned.
> (3) -- Specify depth=3, which means that the top-K (10 in this example)
> should be computed among all nodes up to depth '3'.
> (4) -- add the FacetRequest to the search params.
> (5) -- Create the FacetsCollector
> (6) -- Issue the search
> (7) -- Print the result, in this case only one FacetResult exists because
> only one dimension (FacetRequest) was asked.
>
> This prints the following:
>
> Request: monday nRes=10 nLbl=10
> Num valid Descendants (up to specified depth): 5
> Facet Result Node with 5 sub result nodes.
> Name: monday
> Value: 3.0
> Residue: 0.0
>
> Subresult #0
> Facet Result Node with 0 sub result nodes.
> Name: monday/1pm
> Value: 2.0
> Residue: 0.0
>
> Subresult #1
> Facet Result Node with 0 sub result nodes.
> Name: monday/2pm/3min
> Value: 1.0
> Residue: 0.0
>
> Subresult #2
> Facet Result Node with 0 sub result nodes.
> Name: monday/2pm
> Value: 1.0
> Residue: 0.0
>
> Subresult #3
> Facet Result Node with 0 sub result nodes.
> Name: monday/1pm/4min
> Value: 1.0
> Residue: 0.0
>
> Subresult #4
> Facet Result Node with 0 sub result nodes.
> Name: monday/1pm/3min
> Value: 1.0
> Residue: 0.0
>
> I believe that's what you were looking for?
>
> The DrillDown class provide helper utility methods for drilling-down on a
> selected facet. I.e., if you return the user the above results, and he
> clicks on "Monday/1pm", you want to constraint the search to this category
> only. The DrillDown class helps you create a Query out of the user's
> selection.
>
> We wrote a very extensive userguide which unfortunately didn't make it into
> the release. I've attached its PDF version in this issue:
> https://issues.apache.org/jira/browse/LUCENE-3261. I intend to make an HTML
> version out of it, so that it will be included with future releases.
> Apologies for the delay.
>
> Shai
>
> On Wed, Sep 21, 2011 at 10:44 PM, Mihai Caraman <ca...@gmail.com>wrote:
>
>> monday, 1pm, 3min
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: FacetedSearch DrillDown
Posted by Shai Erera <se...@gmail.com>.
Hi Mihai,
thanks for clarifying the question. The facet module supports that quite
easily actually. I've included a sample code with some description:
(1) FacetSearchParams fsp = new FacetSearchParams();
(2) CountFacetRequest facetRequest = new CountFacetRequest(new
CategoryPath("monday"), 10);
(3) facetRequest.setDepth(3);
(4) fsp.addFacetRequest(facetRequest);
(5) FacetsCollector col = new FacetsCollector(fsp,
searcher.getIndexReader(), taxoReader);
(6) searcher.search(new MatchAllDocsQuery(), col);
(7) System.out.println(col.getFacetResults().get(0));
Explanation:
(1) -- create FacetSearchParams with the default FacetIndexingParams. This
is the common case.
(2) -- Create CountFacetRequest, for the 'monday' node (which is the
top-level node in your example), and specify that the top-10 counted
categories should be returned.
(3) -- Specify depth=3, which means that the top-K (10 in this example)
should be computed among all nodes up to depth '3'.
(4) -- add the FacetRequest to the search params.
(5) -- Create the FacetsCollector
(6) -- Issue the search
(7) -- Print the result, in this case only one FacetResult exists because
only one dimension (FacetRequest) was asked.
This prints the following:
Request: monday nRes=10 nLbl=10
Num valid Descendants (up to specified depth): 5
Facet Result Node with 5 sub result nodes.
Name: monday
Value: 3.0
Residue: 0.0
Subresult #0
Facet Result Node with 0 sub result nodes.
Name: monday/1pm
Value: 2.0
Residue: 0.0
Subresult #1
Facet Result Node with 0 sub result nodes.
Name: monday/2pm/3min
Value: 1.0
Residue: 0.0
Subresult #2
Facet Result Node with 0 sub result nodes.
Name: monday/2pm
Value: 1.0
Residue: 0.0
Subresult #3
Facet Result Node with 0 sub result nodes.
Name: monday/1pm/4min
Value: 1.0
Residue: 0.0
Subresult #4
Facet Result Node with 0 sub result nodes.
Name: monday/1pm/3min
Value: 1.0
Residue: 0.0
I believe that's what you were looking for?
The DrillDown class provide helper utility methods for drilling-down on a
selected facet. I.e., if you return the user the above results, and he
clicks on "Monday/1pm", you want to constraint the search to this category
only. The DrillDown class helps you create a Query out of the user's
selection.
We wrote a very extensive userguide which unfortunately didn't make it into
the release. I've attached its PDF version in this issue:
https://issues.apache.org/jira/browse/LUCENE-3261. I intend to make an HTML
version out of it, so that it will be included with future releases.
Apologies for the delay.
Shai
On Wed, Sep 21, 2011 at 10:44 PM, Mihai Caraman <ca...@gmail.com>wrote:
> monday, 1pm, 3min
Re: FacetedSearch DrillDown
Posted by Mihai Caraman <ca...@gmail.com>.
2011/9/21 Shai Erera <se...@gmail.com>
> What do you mean "up to lvl3"?
>
"as *deep *as lvl3" :P
In this example, let's look at these lvls as a tree(like n-ary tree) with
root in a unique value at(the top) lvl 1
..one with category [l1, l2, l3] and one with [l1, l2],
All documents have the same depth (of categories) so as:
lvl1 lvl2 lvl3
doc1: monday, 1pm, 3min
doc2: monday, 1pm, 4min
doc3: monday, 2pm, 3min
and you ask to count "l1", you will get2.
i'm looking to get(with drilling all the way down) (with *:number* being
the value=number of results):
monday:3{
-1pm:2{
-3min:1
-4min:1
}
-2pm:1{
-3min:1
}
}
I've managed to do this with repetive searches
I search for monday, get 1pm,2pm
Then I search for monday/1pm , get 3min,4min
And Then I search for monday/1pm/3min... and so forth for every
branch in this *categoryTree*
The question being, is there a faster way?isn't DrillDown.query(...) meant
for this?
Where can i find more documentation on this kind of search, I'm interested
in occupied space and computing time, because I imagine it's not meant for
huge depths or lots of categories.
Again thanks for the reply and I appreciate very much this fantastic
feature!
Mihai
Re: FacetedSearch DrillDown
Posted by Shai Erera <se...@gmail.com>.
Can you please clarify the question? What do you mean "up to lvl3"?
Let me try with an example: if you index two documents, one with category
[l1, l2, l3] and one with [l1, l2], and you ask to count "l1", you will get
2. If you ask to count [l1, l2, l3] you will get 1, as only one document is
associated with that node. If you index a third document with category [l1,
l2, l3, l4] and ask to count [l1, l2, l3], you will get 2. Does that answer
your question?
Perhaps I didn't understand your question, so if you have a concrete example
of few docs, query and expected results, I'll be able to provide a better
answer.
Shai
On Wed, Sep 21, 2011 at 7:02 PM, Mihai Caraman <ca...@gmail.com>wrote:
> Hello gurus,
> Cutting to the chase, I index this: CategoryPath(lvl1,lvl2,lvl3)
>
> I want to group things as deep as lvl3.
> Which should be more eficient:
> *search for categoryPath(lvl1) to get lvl2 results:
> search lvl2 number of times for categoryPath(lvl1,lvl2) to get lvl3
> results*
> ?
> or
> *search drilldown categorPath(lvl1)* // i can't test this because it
> doesn't
> give me the results i expect in the SimpleSearch example( maybe i didn't
> understand them and they can't even be compared to what i need)
>
> Thank you,
> Mihai
>