You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Raf <r....@gmail.com> on 2014/06/11 09:51:44 UTC

Faceted Search User's Guide for Lucene 4.8.1

Hi,
I have found this useful guide to the "*Lucene Faceted Search*":
http://lucene.apache.org/core/4_4_0/facet/org/apache/lucene/facet/doc-files/userguide.html

The problem is that it refers to Lucene version 4.4, while I am using the
latest available release (4.8.1) and I cannot find some classes (e.g.
FacetSearchParams
or CountFacetRequest).

Is there an updated version of that guide?
I tried this http://lucene.apache.org/core/*4_8_1*/facet/org/apache/lucene/facet/doc-files/userguide.html
but it does not work :|

Thank you for any help you can provide.

Regards,
*Raf*

Re: Facet migration 4.6.1 to > 4.7.0

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi,

On Tue, 2014-06-17 at 17:51 +0300, Shai Erera wrote:
>         - we are extending FacetResultsHandler to change the order of
>         the facet
>         results (i.e. date facets ordered by date instead of count).
>         How can I
>         achieve this now?
> 
> 
> Now everything is a Facets. In your case, since you use the taxonomy,
> it's TaxonomyFacets. You can check the class-hierarchy, where you have
> IntTaxoFacets (to deal w/ integers) and then TaxoFacetCounts and
> FastTaxoFacetCounts. I think you want to extend either IntTaxoFacets,
> or just TaxonomyFacets. Then if you ask for the 'date' dimension,
> delegate to the one that sorts by the date value, otherwise to the
> default one?
> 
> 
> When you say you sort by date, do you count the topN and then sort
> them by date, or you sort by date the entire dimension and then return
> topN? If the latter, does it mean you resolve each ordinal to its Date
> value to sort by? It might be a bit expensive to resolve that ... I
> wonder if you could do that w/ a NumericDocValues too ... e.g. add
> Year, Month, Day numeric DV fields, then aggregate by their value
> instead of resolving them to ordinals ... it's probably more involved
> than that, i.e. counting 2013/March is more complicated, but there's
> got to be a solution, like maybe ask to count March, but filter the
> query by year:2013 ... need to think about that.

I had an abstract implementation of FacetResultsHandler that was
permitting to the extenders to provide their own PriorityQueue that was
ordering in my case by label instead of value; the previous API in the
code was working with and instance of PriorityQueue<FacetResultNode> and
FacetResultNode was a better container of information compare to
OrdAndValue (at least for my case). I probably need to reimplement again
this part.

> 
>         - we have usual IndexReaders opened in groups with
>         MultiReader, than we're
>         merging in RAM the TaxonomyReaders to obtain a correspondence
>         of the
>         MultiReader for the taxonomies. Do you think I can still do
>         this?
> 
> The taxonomy in general hasn't changed. Besides CategoryPath which was
> replaced by String[], it's more or less the same.

OK I will try to adapt this part
> 
>         - at some point you removed the residue information from
>         facets and we
>         calculated it differently; am I right I can now calculate it
>         as
>         FacetResult.childCount - FacetResult.labelValues.length?
> 
> 
> If the residue is the number of children that had counts>0 but are not
> in the topN, then yes, the above computation seems right.
> FR.childCount denotes how many child labels were encountered, while
> FR.labelValues.length is <= N, where N is topN that you ask to count.

Yes, your assumption is right I already sorted out this part

> 
> 
>         - we are extending TaxonomyFacetsAccumulator to provide:
>           - specific FacetResultsHandler(s) depeding on the facet
>           - add facet other than the topk if the user selected some
>         facet values
>         from the "residue".
>         where does the API permit to extends the behavior to achieve
>         this?
> 
> 
> FacetsCollector hasn't changed much and returns a List<MatchingDocs>.
> The entire additional chain (Accumulator, ResultHandler etc.) is now a
> Facets. So you basically either need to extend Facets (or
> TaxonomyFacets), or write your own class which just processes the
> List<MatchingDocs>.
> 
> There's no "right way" to do it, it depends on what you want to
> achieve. If its e.g. the different sort-order (date vs other), I would
> try to extend one of the existing classes (IntTaxoFacets). If it's
> something completely different, e.g. RangeFacetCounts, you should be
> able to just extend Facets. And if it's not a "Facets" thing at all,
> i.e. you don't need its API, just write your own interface to process
> the list of MatchingDocs.
> 
> Hope that helps
> 
> 
> Shai

Nicola.
> 
> 
> 
> On Tue, Jun 17, 2014 at 5:30 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Hi,
>         
>         I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some
>         Facet API
>         changes happened on 4.7.0 probably mostly related to this
>         ticket:
>         http://issues.apache.org/jira/browse/LUCENE-5339
>         
>         Here are few question about some customization/extension we
>         did and
>         seem not having a direct counterpart/extension point in the
>         new API;
>         can someone help with these questions?
>         
>         - we are extending FacetResultsHandler to change the order of
>         the facet
>         results (i.e. date facets ordered by date instead of count).
>         How can I
>         achieve this now?
>         
>         - we have usual IndexReaders opened in groups with
>         MultiReader, than we're
>         merging in RAM the TaxonomyReaders to obtain a correspondence
>         of the
>         MultiReader for the taxonomies. Do you think I can still do
>         this?
>         
>         - at some point you removed the residue information from
>         facets and we
>         calculated it differently; am I right I can now calculate it
>         as
>         FacetResult.childCount - FacetResult.labelValues.length?
>         
>         - we are extending TaxonomyFacetsAccumulator to provide:
>           - specific FacetResultsHandler(s) depeding on the facet
>           - add facet other than the topk if the user selected some
>         facet values
>         from the "residue".
>         where does the API permit to extends the behavior to achieve
>         this?
>         
>         
>         Any help will be really apreciated,
>         
>         
>         
>         Nicola.
>         
>         
>         
>         --
>         Nicola Buso
>         Software Engineer - Web Production Team
>         
>         European Bioinformatics Institute (EMBL-EBI)
>         European Molecular Biology Laboratory
>         
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         United Kingdom
>         
>         URL: http://www.ebi.ac.uk
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
> 
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Facet migration 4.6.1 to > 4.7.0

Posted by Shai Erera <se...@gmail.com>.
>
> - we are extending FacetResultsHandler to change the order of the facet
> results (i.e. date facets ordered by date instead of count). How can I
> achieve this now?
>

Now everything is a Facets. In your case, since you use the taxonomy, it's
TaxonomyFacets. You can check the class-hierarchy, where you have
IntTaxoFacets (to deal w/ integers) and then TaxoFacetCounts and
FastTaxoFacetCounts. I think you want to extend either IntTaxoFacets, or
just TaxonomyFacets. Then if you ask for the 'date' dimension, delegate to
the one that sorts by the date value, otherwise to the default one?

When you say you sort by date, do you count the topN and then sort them by
date, or you sort by date the entire dimension and then return topN? If the
latter, does it mean you resolve each ordinal to its Date value to sort by?
It might be a bit expensive to resolve that ... I wonder if you could do
that w/ a NumericDocValues too ... e.g. add Year, Month, Day numeric DV
fields, then aggregate by their value instead of resolving them to ordinals
... it's probably more involved than that, i.e. counting 2013/March is more
complicated, but there's got to be a solution, like maybe ask to count
March, but filter the query by year:2013 ... need to think about that.

- we have usual IndexReaders opened in groups with MultiReader, than we're
> merging in RAM the TaxonomyReaders to obtain a correspondence of the
> MultiReader for the taxonomies. Do you think I can still do this?
>

The taxonomy in general hasn't changed. Besides CategoryPath which was
replaced by String[], it's more or less the same.

- at some point you removed the residue information from facets and we
> calculated it differently; am I right I can now calculate it as
> FacetResult.childCount - FacetResult.labelValues.length?
>

If the residue is the number of children that had counts>0 but are not in
the topN, then yes, the above computation seems right. FR.childCount
denotes how many child labels were encountered, while FR.labelValues.length
is <= N, where N is topN that you ask to count.

- we are extending TaxonomyFacetsAccumulator to provide:
>   - specific FacetResultsHandler(s) depeding on the facet
>   - add facet other than the topk if the user selected some facet values
> from the "residue".
> where does the API permit to extends the behavior to achieve this?
>

FacetsCollector hasn't changed much and returns a List<MatchingDocs>. The
entire additional chain (Accumulator, ResultHandler etc.) is now a Facets.
So you basically either need to extend Facets (or TaxonomyFacets), or write
your own class which just processes the List<MatchingDocs>.

There's no "right way" to do it, it depends on what you want to achieve. If
its e.g. the different sort-order (date vs other), I would try to extend
one of the existing classes (IntTaxoFacets). If it's something completely
different, e.g. RangeFacetCounts, you should be able to just extend Facets.
And if it's not a "Facets" thing at all, i.e. you don't need its API, just
write your own interface to process the list of MatchingDocs.

Hope that helps

Shai


On Tue, Jun 17, 2014 at 5:30 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi,
>
> I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some Facet API
> changes happened on 4.7.0 probably mostly related to this ticket:
> http://issues.apache.org/jira/browse/LUCENE-5339
>
> Here are few question about some customization/extension we did and
> seem not having a direct counterpart/extension point in the new API;
> can someone help with these questions?
>
> - we are extending FacetResultsHandler to change the order of the facet
> results (i.e. date facets ordered by date instead of count). How can I
> achieve this now?
>
> - we have usual IndexReaders opened in groups with MultiReader, than we're
> merging in RAM the TaxonomyReaders to obtain a correspondence of the
> MultiReader for the taxonomies. Do you think I can still do this?
>
> - at some point you removed the residue information from facets and we
> calculated it differently; am I right I can now calculate it as
> FacetResult.childCount - FacetResult.labelValues.length?
>
> - we are extending TaxonomyFacetsAccumulator to provide:
>   - specific FacetResultsHandler(s) depeding on the facet
>   - add facet other than the topk if the user selected some facet values
> from the "residue".
> where does the API permit to extends the behavior to achieve this?
>
>
> Any help will be really apreciated,
>
>
>
> Nicola.
>
>
>
> --
> Nicola Buso
> Software Engineer - Web Production Team
>
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
>
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> URL: http://www.ebi.ac.uk
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Facet migration 4.6.1 to > 4.7.0

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi,

I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some Facet API
changes happened on 4.7.0 probably mostly related to this ticket:
http://issues.apache.org/jira/browse/LUCENE-5339

Here are few question about some customization/extension we did and
seem not having a direct counterpart/extension point in the new API;
can someone help with these questions?

- we are extending FacetResultsHandler to change the order of the facet
results (i.e. date facets ordered by date instead of count). How can I
achieve this now?

- we have usual IndexReaders opened in groups with MultiReader, than we're
merging in RAM the TaxonomyReaders to obtain a correspondence of the
MultiReader for the taxonomies. Do you think I can still do this?

- at some point you removed the residue information from facets and we
calculated it differently; am I right I can now calculate it as
FacetResult.childCount - FacetResult.labelValues.length?

- we are extending TaxonomyFacetsAccumulator to provide:
  - specific FacetResultsHandler(s) depeding on the facet
  - add facet other than the topk if the user selected some facet values
from the "residue".
where does the API permit to extends the behavior to achieve this?


Any help will be really apreciated,



Nicola.



-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted Search User's Guide for Lucene 4.8.1

Posted by Shai Erera <se...@gmail.com>.
I understand but since the facet module was (and still is) experimental, we
were also experimenting w/ its APIs and ways to simplify them. The
userguide was a mess - while valuable to newcomers, it was impossible to
keep it up to date with the API changes ever since it was contributed to
Lucene. The facet module has gone under two major overhauling since 3.4,
both for good reasons (performance and API simplification). It was way
outdated and users complained about that too.

Given that, we've decided to remove it and suffice with blog posts, some of
which are already outdated too, and best of all is the demo code.

If you have specific questions on how to migrate to the new API, I would be
happy to assist you. But I don't think we'll write another userguide.

Also, if you can point to missing jdocs or places where they could be
improved, please open a JIRA issue and I'll help as much as I can with that
too.

Shai


On Mon, Jun 16, 2014 at 7:15 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi Shai,
>
> I'm going to update from 4.6.1 to 4.8.1 :-(
>
> On Wed, 2014-06-11 at 14:05 +0300, Shai Erera wrote:
> > Hi
> >
> > We removed the userguide long time ago, and replaced it with better
> > documentation on the classes and package.html, as well as demo code that
> > you can find here:
> >
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/
>
> where is the documentatation on packages?
>
> http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/package-summary.html
>
> http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/range/package-summary.html
>
> http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/sortedset/package-summary.html
>
> http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/taxonomy/package-summary.html
>
> http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/taxonomy/directory/package-summary.html
>
> this is nothing compare to:
>
> http://lucene.apache.org/core/4_4_0/facet/org/apache/lucene/facet/doc-files/userguide.html
>
> I know, I can still learn from the demo code... but if I'm not starting
> from scratch it's not that straight forward to migrate.
>
> If I'm right most of the changes are because of:
> https://issues.apache.org/jira/browse/LUCENE-5339
>
> that's a long thread of discussions on what happened but a bit confusing
> to recap. Is there any migration guide on this?
>
>
>
> Nicola.
>
> > You can also look up some blog posts that I wrote a while ago on facets,
> > that explain how they work and some internals, even though the code
> > examples are not up-to-date w/ latest API changes:
> >
> > http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html
> > http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html
> > http://shaierera.blogspot.com/2012/12/lucene-facets-under-hood.html
> > http://shaierera.blogspot.com/2013/01/facet-associations.html
> >
> > Shai
> >
> >
> > On Wed, Jun 11, 2014 at 10:51 AM, Raf <r....@gmail.com> wrote:
> >
> > > Hi,
> > > I have found this useful guide to the "*Lucene Faceted Search*":
> > >
> > >
> http://lucene.apache.org/core/4_4_0/facet/org/apache/lucene/facet/doc-files/userguide.html
> > >
> > > The problem is that it refers to Lucene version 4.4, while I am using
> the
> > > latest available release (4.8.1) and I cannot find some classes (e.g.
> > > FacetSearchParams
> > > or CountFacetRequest).
> > >
> > > Is there an updated version of that guide?
> > > I tried this
> > >
> http://lucene.apache.org/core/*4_8_1*/facet/org/apache/lucene/facet/doc-files/userguide.html
> > > but it does not work :|
> > >
> > > Thank you for any help you can provide.
> > >
> > > Regards,
> > > *Raf*
> > >
>
> --
> Nicola Buso
> Software Engineer - Web Production Team
>
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
>
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> URL: http://www.ebi.ac.uk
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Faceted Search User's Guide for Lucene 4.8.1

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Shai,

I'm going to update from 4.6.1 to 4.8.1 :-(

On Wed, 2014-06-11 at 14:05 +0300, Shai Erera wrote:
> Hi
> 
> We removed the userguide long time ago, and replaced it with better
> documentation on the classes and package.html, as well as demo code that
> you can find here:
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/

where is the documentatation on packages?
http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/package-summary.html
http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/range/package-summary.html
http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/sortedset/package-summary.html
http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/taxonomy/package-summary.html
http://lucene.apache.org/core/4_8_1/facet/org/apache/lucene/facet/taxonomy/directory/package-summary.html

this is nothing compare to:
http://lucene.apache.org/core/4_4_0/facet/org/apache/lucene/facet/doc-files/userguide.html

I know, I can still learn from the demo code... but if I'm not starting
from scratch it's not that straight forward to migrate.

If I'm right most of the changes are because of:
https://issues.apache.org/jira/browse/LUCENE-5339

that's a long thread of discussions on what happened but a bit confusing
to recap. Is there any migration guide on this?



Nicola.

> You can also look up some blog posts that I wrote a while ago on facets,
> that explain how they work and some internals, even though the code
> examples are not up-to-date w/ latest API changes:
> 
> http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html
> http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html
> http://shaierera.blogspot.com/2012/12/lucene-facets-under-hood.html
> http://shaierera.blogspot.com/2013/01/facet-associations.html
> 
> Shai
> 
> 
> On Wed, Jun 11, 2014 at 10:51 AM, Raf <r....@gmail.com> wrote:
> 
> > Hi,
> > I have found this useful guide to the "*Lucene Faceted Search*":
> >
> > http://lucene.apache.org/core/4_4_0/facet/org/apache/lucene/facet/doc-files/userguide.html
> >
> > The problem is that it refers to Lucene version 4.4, while I am using the
> > latest available release (4.8.1) and I cannot find some classes (e.g.
> > FacetSearchParams
> > or CountFacetRequest).
> >
> > Is there an updated version of that guide?
> > I tried this
> > http://lucene.apache.org/core/*4_8_1*/facet/org/apache/lucene/facet/doc-files/userguide.html
> > but it does not work :|
> >
> > Thank you for any help you can provide.
> >
> > Regards,
> > *Raf*
> >

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Faceted Search User's Guide for Lucene 4.8.1

Posted by Raf <r....@gmail.com>.
Hi Shai,

On Wed, Jun 11, 2014 at 1:05 PM, Shai Erera <se...@gmail.com> wrote:

> Hi
>
> We removed the userguide long time ago, and replaced it with better
> documentation on the classes and package.html, as well as demo code that
> you can find here:
>
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/
>

Great, this is very useful!


>
> You can also look up some blog posts that I wrote a while ago on facets,
> that explain how they work and some internals, even though the code
> examples are not up-to-date w/ latest API changes:
>
> http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html
> http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html
> http://shaierera.blogspot.com/2012/12/lucene-facets-under-hood.html
> http://shaierera.blogspot.com/2013/01/facet-associations.html
>

Yes, I started exactly from here :)
I read these posts yesterday and I found them very useful to understand the
basics.
But today, when I tried to write some experiments using lucene 4.8.1, I
couldn't find some of the classes used by the code examples.

Thank you for your response and the useful link to the demo package.

Bye
*Raf*

Re: Faceted Search User's Guide for Lucene 4.8.1

Posted by Shai Erera <se...@gmail.com>.
Hi

We removed the userguide long time ago, and replaced it with better
documentation on the classes and package.html, as well as demo code that
you can find here:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/

You can also look up some blog posts that I wrote a while ago on facets,
that explain how they work and some internals, even though the code
examples are not up-to-date w/ latest API changes:

http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html
http://shaierera.blogspot.com/2012/11/lucene-facets-part-2.html
http://shaierera.blogspot.com/2012/12/lucene-facets-under-hood.html
http://shaierera.blogspot.com/2013/01/facet-associations.html

Shai


On Wed, Jun 11, 2014 at 10:51 AM, Raf <r....@gmail.com> wrote:

> Hi,
> I have found this useful guide to the "*Lucene Faceted Search*":
>
> http://lucene.apache.org/core/4_4_0/facet/org/apache/lucene/facet/doc-files/userguide.html
>
> The problem is that it refers to Lucene version 4.4, while I am using the
> latest available release (4.8.1) and I cannot find some classes (e.g.
> FacetSearchParams
> or CountFacetRequest).
>
> Is there an updated version of that guide?
> I tried this
> http://lucene.apache.org/core/*4_8_1*/facet/org/apache/lucene/facet/doc-files/userguide.html
> but it does not work :|
>
> Thank you for any help you can provide.
>
> Regards,
> *Raf*
>