You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2014/06/17 16:30:38 UTC

Facet migration 4.6.1 to > 4.7.0

Hi,

I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some Facet API
changes happened on 4.7.0 probably mostly related to this ticket:
http://issues.apache.org/jira/browse/LUCENE-5339

Here are few question about some customization/extension we did and
seem not having a direct counterpart/extension point in the new API;
can someone help with these questions?

- we are extending FacetResultsHandler to change the order of the facet
results (i.e. date facets ordered by date instead of count). How can I
achieve this now?

- we have usual IndexReaders opened in groups with MultiReader, than we're
merging in RAM the TaxonomyReaders to obtain a correspondence of the
MultiReader for the taxonomies. Do you think I can still do this?

- at some point you removed the residue information from facets and we
calculated it differently; am I right I can now calculate it as
FacetResult.childCount - FacetResult.labelValues.length?

- we are extending TaxonomyFacetsAccumulator to provide:
  - specific FacetResultsHandler(s) depeding on the facet
  - add facet other than the topk if the user selected some facet values
from the "residue".
where does the API permit to extends the behavior to achieve this?


Any help will be really apreciated,



Nicola.



-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Facet migration 4.6.1 to > 4.7.0

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi,

On Tue, 2014-06-17 at 17:51 +0300, Shai Erera wrote:
>         - we are extending FacetResultsHandler to change the order of
>         the facet
>         results (i.e. date facets ordered by date instead of count).
>         How can I
>         achieve this now?
> 
> 
> Now everything is a Facets. In your case, since you use the taxonomy,
> it's TaxonomyFacets. You can check the class-hierarchy, where you have
> IntTaxoFacets (to deal w/ integers) and then TaxoFacetCounts and
> FastTaxoFacetCounts. I think you want to extend either IntTaxoFacets,
> or just TaxonomyFacets. Then if you ask for the 'date' dimension,
> delegate to the one that sorts by the date value, otherwise to the
> default one?
> 
> 
> When you say you sort by date, do you count the topN and then sort
> them by date, or you sort by date the entire dimension and then return
> topN? If the latter, does it mean you resolve each ordinal to its Date
> value to sort by? It might be a bit expensive to resolve that ... I
> wonder if you could do that w/ a NumericDocValues too ... e.g. add
> Year, Month, Day numeric DV fields, then aggregate by their value
> instead of resolving them to ordinals ... it's probably more involved
> than that, i.e. counting 2013/March is more complicated, but there's
> got to be a solution, like maybe ask to count March, but filter the
> query by year:2013 ... need to think about that.

I had an abstract implementation of FacetResultsHandler that was
permitting to the extenders to provide their own PriorityQueue that was
ordering in my case by label instead of value; the previous API in the
code was working with and instance of PriorityQueue<FacetResultNode> and
FacetResultNode was a better container of information compare to
OrdAndValue (at least for my case). I probably need to reimplement again
this part.

> 
>         - we have usual IndexReaders opened in groups with
>         MultiReader, than we're
>         merging in RAM the TaxonomyReaders to obtain a correspondence
>         of the
>         MultiReader for the taxonomies. Do you think I can still do
>         this?
> 
> The taxonomy in general hasn't changed. Besides CategoryPath which was
> replaced by String[], it's more or less the same.

OK I will try to adapt this part
> 
>         - at some point you removed the residue information from
>         facets and we
>         calculated it differently; am I right I can now calculate it
>         as
>         FacetResult.childCount - FacetResult.labelValues.length?
> 
> 
> If the residue is the number of children that had counts>0 but are not
> in the topN, then yes, the above computation seems right.
> FR.childCount denotes how many child labels were encountered, while
> FR.labelValues.length is <= N, where N is topN that you ask to count.

Yes, your assumption is right I already sorted out this part

> 
> 
>         - we are extending TaxonomyFacetsAccumulator to provide:
>           - specific FacetResultsHandler(s) depeding on the facet
>           - add facet other than the topk if the user selected some
>         facet values
>         from the "residue".
>         where does the API permit to extends the behavior to achieve
>         this?
> 
> 
> FacetsCollector hasn't changed much and returns a List<MatchingDocs>.
> The entire additional chain (Accumulator, ResultHandler etc.) is now a
> Facets. So you basically either need to extend Facets (or
> TaxonomyFacets), or write your own class which just processes the
> List<MatchingDocs>.
> 
> There's no "right way" to do it, it depends on what you want to
> achieve. If its e.g. the different sort-order (date vs other), I would
> try to extend one of the existing classes (IntTaxoFacets). If it's
> something completely different, e.g. RangeFacetCounts, you should be
> able to just extend Facets. And if it's not a "Facets" thing at all,
> i.e. you don't need its API, just write your own interface to process
> the list of MatchingDocs.
> 
> Hope that helps
> 
> 
> Shai

Nicola.
> 
> 
> 
> On Tue, Jun 17, 2014 at 5:30 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Hi,
>         
>         I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some
>         Facet API
>         changes happened on 4.7.0 probably mostly related to this
>         ticket:
>         http://issues.apache.org/jira/browse/LUCENE-5339
>         
>         Here are few question about some customization/extension we
>         did and
>         seem not having a direct counterpart/extension point in the
>         new API;
>         can someone help with these questions?
>         
>         - we are extending FacetResultsHandler to change the order of
>         the facet
>         results (i.e. date facets ordered by date instead of count).
>         How can I
>         achieve this now?
>         
>         - we have usual IndexReaders opened in groups with
>         MultiReader, than we're
>         merging in RAM the TaxonomyReaders to obtain a correspondence
>         of the
>         MultiReader for the taxonomies. Do you think I can still do
>         this?
>         
>         - at some point you removed the residue information from
>         facets and we
>         calculated it differently; am I right I can now calculate it
>         as
>         FacetResult.childCount - FacetResult.labelValues.length?
>         
>         - we are extending TaxonomyFacetsAccumulator to provide:
>           - specific FacetResultsHandler(s) depeding on the facet
>           - add facet other than the topk if the user selected some
>         facet values
>         from the "residue".
>         where does the API permit to extends the behavior to achieve
>         this?
>         
>         
>         Any help will be really apreciated,
>         
>         
>         
>         Nicola.
>         
>         
>         
>         --
>         Nicola Buso
>         Software Engineer - Web Production Team
>         
>         European Bioinformatics Institute (EMBL-EBI)
>         European Molecular Biology Laboratory
>         
>         Wellcome Trust Genome Campus
>         Hinxton
>         Cambridge CB10 1SD
>         United Kingdom
>         
>         URL: http://www.ebi.ac.uk
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
> 
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Facet migration 4.6.1 to > 4.7.0

Posted by Shai Erera <se...@gmail.com>.
>
> - we are extending FacetResultsHandler to change the order of the facet
> results (i.e. date facets ordered by date instead of count). How can I
> achieve this now?
>

Now everything is a Facets. In your case, since you use the taxonomy, it's
TaxonomyFacets. You can check the class-hierarchy, where you have
IntTaxoFacets (to deal w/ integers) and then TaxoFacetCounts and
FastTaxoFacetCounts. I think you want to extend either IntTaxoFacets, or
just TaxonomyFacets. Then if you ask for the 'date' dimension, delegate to
the one that sorts by the date value, otherwise to the default one?

When you say you sort by date, do you count the topN and then sort them by
date, or you sort by date the entire dimension and then return topN? If the
latter, does it mean you resolve each ordinal to its Date value to sort by?
It might be a bit expensive to resolve that ... I wonder if you could do
that w/ a NumericDocValues too ... e.g. add Year, Month, Day numeric DV
fields, then aggregate by their value instead of resolving them to ordinals
... it's probably more involved than that, i.e. counting 2013/March is more
complicated, but there's got to be a solution, like maybe ask to count
March, but filter the query by year:2013 ... need to think about that.

- we have usual IndexReaders opened in groups with MultiReader, than we're
> merging in RAM the TaxonomyReaders to obtain a correspondence of the
> MultiReader for the taxonomies. Do you think I can still do this?
>

The taxonomy in general hasn't changed. Besides CategoryPath which was
replaced by String[], it's more or less the same.

- at some point you removed the residue information from facets and we
> calculated it differently; am I right I can now calculate it as
> FacetResult.childCount - FacetResult.labelValues.length?
>

If the residue is the number of children that had counts>0 but are not in
the topN, then yes, the above computation seems right. FR.childCount
denotes how many child labels were encountered, while FR.labelValues.length
is <= N, where N is topN that you ask to count.

- we are extending TaxonomyFacetsAccumulator to provide:
>   - specific FacetResultsHandler(s) depeding on the facet
>   - add facet other than the topk if the user selected some facet values
> from the "residue".
> where does the API permit to extends the behavior to achieve this?
>

FacetsCollector hasn't changed much and returns a List<MatchingDocs>. The
entire additional chain (Accumulator, ResultHandler etc.) is now a Facets.
So you basically either need to extend Facets (or TaxonomyFacets), or write
your own class which just processes the List<MatchingDocs>.

There's no "right way" to do it, it depends on what you want to achieve. If
its e.g. the different sort-order (date vs other), I would try to extend
one of the existing classes (IntTaxoFacets). If it's something completely
different, e.g. RangeFacetCounts, you should be able to just extend Facets.
And if it's not a "Facets" thing at all, i.e. you don't need its API, just
write your own interface to process the list of MatchingDocs.

Hope that helps

Shai


On Tue, Jun 17, 2014 at 5:30 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi,
>
> I'm migrating from lucene 4.6.1 to 4.8.1 and I noticed some Facet API
> changes happened on 4.7.0 probably mostly related to this ticket:
> http://issues.apache.org/jira/browse/LUCENE-5339
>
> Here are few question about some customization/extension we did and
> seem not having a direct counterpart/extension point in the new API;
> can someone help with these questions?
>
> - we are extending FacetResultsHandler to change the order of the facet
> results (i.e. date facets ordered by date instead of count). How can I
> achieve this now?
>
> - we have usual IndexReaders opened in groups with MultiReader, than we're
> merging in RAM the TaxonomyReaders to obtain a correspondence of the
> MultiReader for the taxonomies. Do you think I can still do this?
>
> - at some point you removed the residue information from facets and we
> calculated it differently; am I right I can now calculate it as
> FacetResult.childCount - FacetResult.labelValues.length?
>
> - we are extending TaxonomyFacetsAccumulator to provide:
>   - specific FacetResultsHandler(s) depeding on the facet
>   - add facet other than the topk if the user selected some facet values
> from the "residue".
> where does the API permit to extends the behavior to achieve this?
>
>
> Any help will be really apreciated,
>
>
>
> Nicola.
>
>
>
> --
> Nicola Buso
> Software Engineer - Web Production Team
>
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
>
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> URL: http://www.ebi.ac.uk
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>