You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2013/01/21 15:59:25 UTC

FacetedSearch and MultiReader

Hi all,

I'm trying to develop faceted search using lucene 4.0 faceting
framework.
In our project we are searching on multiple indexes using lucene
MultiReader. How should we use the faceted framework to obtain
FacetResults starting from a MultiReader? all the example I see are
using a "single" IndexReader.



Nicola.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Shai Erera <se...@gmail.com>.

Hello Nicola,

I think it would be good if you start a new thread to discuss this problem,
as I don't think it's related to the issue in this thread.
Also, I did not understand what's the problem you're running into. What
used to work before 4.2 and doesn't work now?

Shai


On Tue, Apr 9, 2013 at 6:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi,
>
> I'm trying to use Lucene 4.2, but this merge of more taxonomy indexes
> seam is no more working.
>
> Do you have any idea why it has not to work in Lucene 4.2?
> Normal faceted search on a single index is working correctly.
>
>
> Nicola.
>
> On Thu, 2013-01-24 at 16:53 +0000, Nicola Buso wrote:
> > Hi Shai,
> >
> > I'd like just to give you a confirmation that your solution is working
> > after the tests I did.
> >
> > Thanks again for the useful hints.
> >
> >
> > Nicola.
> >
> > On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> > > Hi Nicola,
> > >
> > > What I had in mind is something similar to this, which is possible
> starting
> > > with Lucene 4.1, due to changes done to facets (per-segment faceting):
> > >
> > > DirTaxoWriter master = new DirTaxoWriter(masterDir);
> > > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open
> Directories
> > > and store in that array
> > > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> > > OrdinalMap and store in that array
> > >
> > > // now do the merge
> > > for (int i = 0; i < origTaxoDirs.length; i++) {
> > >   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> > > }
> > >
> > > // now open your readers, and create the important map
> > > Map<AtomicReader,OrdinalMap) readerOrdinals = new
> > > HashMap<AtomicReader,OrdinalMap>();
> > > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> > > for (int i = 0; i < origTaxoDirs.length; i++) {
> > >   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
> > >   OrdinalMap ordMap = ordinalMaps[i];
> > >   for (AtomicReaderContext ctx : r.leaves()) {
> > >     readerOrdinals.put(ctx.reader(), ordMap);
> > >   }
> > > }
> > >
> > > MultiReader mr = new MultiReader(readers);
> > >
> > > // create your FacetRequest (CountFacetRequest) with a custom
> Aggregator
> > > FacetRequest fr = new CountFacetRequest(cp, topK) {
> > >   @Override
> > >   public Aggregator createAggregator(...) {
> > >     return new OrdinalMappingAggregator() {
> > >       int[] ordMap;
> > >
> > >       @Override
> > >       public void setNextReader(AtomicReaderContext context) {
> > >         ordMap = readerOrdinals.get(context.reader()).getMap();
> > >       }
> > >
> > >       @Override
> > >       public void aggregate(int docID, float score, IntsRef ordinals) {
> > >         int upto = ordinals.offset + ordinals.length;
> > >         for (int i = ordinals.offset; i < upto; i++) {
> > >           int ordinal = ordinals[i]; // original ordinal read for the
> > > AtomicReader given to setNextReader
> > >           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal,
> following
> > > the taxonomy merge
> > >           counts[mappedOrdinal]++; // count the mapped ordinal
> instead, so
> > > all AtomicReaders count that ordinal
> > >         }
> > >       }
> > >     };
> > >   }
> > > }
> > >
> > > While it may look like I wrote actual code to do it, I didn't :). So I
> > > guess it should work, but I haven't tried it.
> > > That way, you don't touch the content indexes at all, just the taxonomy
> > > ones.
> > >
> > > Note however that you'll need to do this step every time the taxonomy
> index
> > > is updated, and you refresh the TaxoReader instance.
> > > Also, this will only work if all your indexes are opened in the same
> JVM
> > > (which I assume that's the case, since you use MultiReader).
> > >
> > > If you still don't want to do that, then what Dennis wrote above is
> another
> > > way to do distributed faceted search, either inside the same JVM or
> across
> > > multiple JVMs.
> > > You obtain the FacetResult from each search and merge the results
> > > (unfortunately, there's still no tool in Lucene to do that for you).
> > > Just make sure to ask for a larger K, to ensure that the correct top-K
> is
> > > returned (see my previous notes).
> > >
> > > Shai
> > >
> > >
> > >
> > >
> > > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <do...@gmail.com>
> wrote:
> > >
> > > > We have similar distribute search system and we have finished with
> the
> > > > following scheme. Search replicas (machines where index resides) are
> build
> > > > FacetResult's based on their index chunk (top N categories with
> document
> > > > counts). Later on the results are merged "by hands" with summing
> relevant
> > > > categories from different replicas.
> > > >
> > > > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> > > >
> > > > > Hi Shai,
> > > > >
> > > > > I was thinking to that too, but I'm indexing all indexes in a
> custom
> > > > > distributed environment than I can't in this moment have a single
> > > > > categories index for all the content indexes at indexing time.
> > > > > A solution should be to merge all the categories indexes in one
> only
> > > > > index and use your solution but the merge code I see in the
> examples
> > > > > merge also the content index and I can't do that.
> > > > >
> > > > > I should share the taxonomy if is possible to merge (I see the
> resulting
> > > > > categories indexes are not that big currently), but I would prefer
> to
> > > > > have a solution where I can collect the facets over multiple
> categories
> > > > > indexes in this way I will be sure the solution will scale better.
> > > > >
> > > > >
> > > > > Nicola.
> > > > >
> > > > >
> > > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > > > >> Hi Nicola,
> > > > >>
> > > > >>
> > > > >> I think that what you're describing corresponds to distributed
> faceted
> > > > >> search. I.e., you have N content indexes, alongside N taxonomy
> > > > >> indexes.
> > > > >>
> > > > >> The information that's indexed in each of those sub-indexes does
> not
> > > > >> correlate with the other ones.
> > > > >> For example, say that you index the category "Movie/Drama", it may
> > > > >> receive ordinal 12 in index1 and 23 in index2.
> > > > >>
> > > > >> If you'll try to count ordinals using MultiReader, you'll just
> mess up
> > > > >> everything.
> > > > >>
> > > > >>
> > > > >> If you can share a single taxonomy index for all N content
> indexes,
> > > > >> then you'll be in a super-simple position:
> > > > >>
> > > > >> 1) Open one TaxonomyReader
> > > > >>
> > > > >> 2) Execute search with MultiReader and FacetsCollector
> > > > >>
> > > > >>
> > > > >>
> > > > >> It doesn't get simpler than that ! :)
> > > > >>
> > > > >>
> > > > >> Before I go into great length describing what you should do if you
> > > > >> cannot share the taxonomy, let me know if that's not an option for
> > > > >> you.
> > > > >>
> > > > >> Shai
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk>
> wrote:
> > > > >>        Thanks for the reply Uwe,
> > > > >>
> > > > >>        we currently can search with MultiReader over all the
> indexes
> > > > >>        we have.
> > > > >>        Now I want to add the faceting search, than I created a
> > > > >>        categories index
> > > > >>        for every index I currently have.
> > > > >>        To accumulate the faceted results now I have a MultiReader
> > > > >>        pointing all
> > > > >>        the indexes and I can create a TaxonomyReader for every
> > > > >>        categories index
> > > > >>        I have; all the way I see to obtain FacetResults are:
> > > > >>        1 - FacetsCollector
> > > > >>        2 - a FacetsAccumulator implementation
> > > > >>
> > > > >>        suppose I use the second option. I should:
> > > > >>        - search as usual using the MultiReader
> > > > >>        - than try to collect all the facetresults iterating over
> my
> > > > >>        TaxonomyReaders; at every iteration:
> > > > >>          - I create a FacetsAccumulator using the MultiReader and
> a
> > > > >>        TaxonomyReader
> > > > >>          - I get a list of FacetResult from the accumulator.
> > > > >>        - as I finish I should in some way merge all the
> > > > >>        List<FacetResult> I
> > > > >>        have.
> > > > >>
> > > > >>        I think this solution is not correct because the docsids
> from
> > > > >>        the search
> > > > >>        are pointing the multireader instead the taxonomyreader is
> > > > >>        pointing to
> > > > >>        the categories index of a single reader.
> > > > >>        I neither like to merge all the List of FacetResult I
> retrieve
> > > > >>        from the
> > > > >>        Accumulators.
> > > > >>
> > > > >>        Probably I'm missing something, can somebody clarify to me
> how
> > > > >>        I should
> > > > >>        collect the facets in this case?
> > > > >>
> > > > >>
> > > > >>        Nicola.
> > > > >>
> > > > >>
> > > > >>
> > > > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > > > >>> Just use MultiReader, it extends IndexReader, so you can
> > > > >>        pass it anywhere where IndexReader can be passed.
> > > > >>>
> > > > >>> -----
> > > > >>> Uwe Schindler
> > > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > >>> http://www.thetaphi.de
> > > > >>> eMail: uwe@thetaphi.de
> > > > >>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > > > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > > > >>>> To: java-user@lucene.apache.org
> > > > >>>> Subject: FacetedSearch and MultiReader
> > > > >>>>
> > > > >>>> Hi all,
> > > > >>>>
> > > > >>>> I'm trying to develop faceted search using lucene 4.0
> > > > >>        faceting framework.
> > > > >>>> In our project we are searching on multiple indexes using
> > > > >>        lucene
> > > > >>>> MultiReader. How should we use the faceted framework to
> > > > >>        obtain
> > > > >>>> FacetResults starting from a MultiReader? all the example
> > > > >>        I see are using a
> > > > >>>> "single" IndexReader.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Nicola.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>
> > > >
>  ---------------------------------------------------------------------
> > > > >>>> To unsubscribe, e-mail:
> > > > >>        java-user-unsubscribe@lucene.apache.org
> > > > >>>> For additional commands, e-mail:
> > > > >>        java-user-help@lucene.apache.org
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
>  ---------------------------------------------------------------------
> > > > >>        To unsubscribe, e-mail:
> > > > >>        java-user-unsubscribe@lucene.apache.org
> > > > >>        For additional commands, e-mail:
> > > > >>        java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > >
> > > > ---
> > > > Denis Bazhenov <do...@gmail.com>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FacetedSearch and MultiReader

Posted by Nicola Buso <nb...@ebi.ac.uk>.

Hi,

I'm trying to use Lucene 4.2, but this merge of more taxonomy indexes
seam is no more working.

Do you have any idea why it has not to work in Lucene 4.2?
Normal faceted search on a single index is working correctly.


Nicola.

On Thu, 2013-01-24 at 16:53 +0000, Nicola Buso wrote:
> Hi Shai,
> 
> I'd like just to give you a confirmation that your solution is working
> after the tests I did.
> 
> Thanks again for the useful hints.
> 
> 
> Nicola.
> 
> On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> > Hi Nicola,
> > 
> > What I had in mind is something similar to this, which is possible starting
> > with Lucene 4.1, due to changes done to facets (per-segment faceting):
> > 
> > DirTaxoWriter master = new DirTaxoWriter(masterDir);
> > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories
> > and store in that array
> > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> > OrdinalMap and store in that array
> > 
> > // now do the merge
> > for (int i = 0; i < origTaxoDirs.length; i++) {
> >   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> > }
> > 
> > // now open your readers, and create the important map
> > Map<AtomicReader,OrdinalMap) readerOrdinals = new
> > HashMap<AtomicReader,OrdinalMap>();
> > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> > for (int i = 0; i < origTaxoDirs.length; i++) {
> >   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
> >   OrdinalMap ordMap = ordinalMaps[i];
> >   for (AtomicReaderContext ctx : r.leaves()) {
> >     readerOrdinals.put(ctx.reader(), ordMap);
> >   }
> > }
> > 
> > MultiReader mr = new MultiReader(readers);
> > 
> > // create your FacetRequest (CountFacetRequest) with a custom Aggregator
> > FacetRequest fr = new CountFacetRequest(cp, topK) {
> >   @Override
> >   public Aggregator createAggregator(...) {
> >     return new OrdinalMappingAggregator() {
> >       int[] ordMap;
> > 
> >       @Override
> >       public void setNextReader(AtomicReaderContext context) {
> >         ordMap = readerOrdinals.get(context.reader()).getMap();
> >       }
> > 
> >       @Override
> >       public void aggregate(int docID, float score, IntsRef ordinals) {
> >         int upto = ordinals.offset + ordinals.length;
> >         for (int i = ordinals.offset; i < upto; i++) {
> >           int ordinal = ordinals[i]; // original ordinal read for the
> > AtomicReader given to setNextReader
> >           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following
> > the taxonomy merge
> >           counts[mappedOrdinal]++; // count the mapped ordinal instead, so
> > all AtomicReaders count that ordinal
> >         }
> >       }
> >     };
> >   }
> > }
> > 
> > While it may look like I wrote actual code to do it, I didn't :). So I
> > guess it should work, but I haven't tried it.
> > That way, you don't touch the content indexes at all, just the taxonomy
> > ones.
> > 
> > Note however that you'll need to do this step every time the taxonomy index
> > is updated, and you refresh the TaxoReader instance.
> > Also, this will only work if all your indexes are opened in the same JVM
> > (which I assume that's the case, since you use MultiReader).
> > 
> > If you still don't want to do that, then what Dennis wrote above is another
> > way to do distributed faceted search, either inside the same JVM or across
> > multiple JVMs.
> > You obtain the FacetResult from each search and merge the results
> > (unfortunately, there's still no tool in Lucene to do that for you).
> > Just make sure to ask for a larger K, to ensure that the correct top-K is
> > returned (see my previous notes).
> > 
> > Shai
> > 
> > 
> > 
> > 
> > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <do...@gmail.com> wrote:
> > 
> > > We have similar distribute search system and we have finished with the
> > > following scheme. Search replicas (machines where index resides) are build
> > > FacetResult's based on their index chunk (top N categories with document
> > > counts). Later on the results are merged "by hands" with summing relevant
> > > categories from different replicas.
> > >
> > > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> > >
> > > > Hi Shai,
> > > >
> > > > I was thinking to that too, but I'm indexing all indexes in a custom
> > > > distributed environment than I can't in this moment have a single
> > > > categories index for all the content indexes at indexing time.
> > > > A solution should be to merge all the categories indexes in one only
> > > > index and use your solution but the merge code I see in the examples
> > > > merge also the content index and I can't do that.
> > > >
> > > > I should share the taxonomy if is possible to merge (I see the resulting
> > > > categories indexes are not that big currently), but I would prefer to
> > > > have a solution where I can collect the facets over multiple categories
> > > > indexes in this way I will be sure the solution will scale better.
> > > >
> > > >
> > > > Nicola.
> > > >
> > > >
> > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > > >> Hi Nicola,
> > > >>
> > > >>
> > > >> I think that what you're describing corresponds to distributed faceted
> > > >> search. I.e., you have N content indexes, alongside N taxonomy
> > > >> indexes.
> > > >>
> > > >> The information that's indexed in each of those sub-indexes does not
> > > >> correlate with the other ones.
> > > >> For example, say that you index the category "Movie/Drama", it may
> > > >> receive ordinal 12 in index1 and 23 in index2.
> > > >>
> > > >> If you'll try to count ordinals using MultiReader, you'll just mess up
> > > >> everything.
> > > >>
> > > >>
> > > >> If you can share a single taxonomy index for all N content indexes,
> > > >> then you'll be in a super-simple position:
> > > >>
> > > >> 1) Open one TaxonomyReader
> > > >>
> > > >> 2) Execute search with MultiReader and FacetsCollector
> > > >>
> > > >>
> > > >>
> > > >> It doesn't get simpler than that ! :)
> > > >>
> > > >>
> > > >> Before I go into great length describing what you should do if you
> > > >> cannot share the taxonomy, let me know if that's not an option for
> > > >> you.
> > > >>
> > > >> Shai
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> > > >>        Thanks for the reply Uwe,
> > > >>
> > > >>        we currently can search with MultiReader over all the indexes
> > > >>        we have.
> > > >>        Now I want to add the faceting search, than I created a
> > > >>        categories index
> > > >>        for every index I currently have.
> > > >>        To accumulate the faceted results now I have a MultiReader
> > > >>        pointing all
> > > >>        the indexes and I can create a TaxonomyReader for every
> > > >>        categories index
> > > >>        I have; all the way I see to obtain FacetResults are:
> > > >>        1 - FacetsCollector
> > > >>        2 - a FacetsAccumulator implementation
> > > >>
> > > >>        suppose I use the second option. I should:
> > > >>        - search as usual using the MultiReader
> > > >>        - than try to collect all the facetresults iterating over my
> > > >>        TaxonomyReaders; at every iteration:
> > > >>          - I create a FacetsAccumulator using the MultiReader and a
> > > >>        TaxonomyReader
> > > >>          - I get a list of FacetResult from the accumulator.
> > > >>        - as I finish I should in some way merge all the
> > > >>        List<FacetResult> I
> > > >>        have.
> > > >>
> > > >>        I think this solution is not correct because the docsids from
> > > >>        the search
> > > >>        are pointing the multireader instead the taxonomyreader is
> > > >>        pointing to
> > > >>        the categories index of a single reader.
> > > >>        I neither like to merge all the List of FacetResult I retrieve
> > > >>        from the
> > > >>        Accumulators.
> > > >>
> > > >>        Probably I'm missing something, can somebody clarify to me how
> > > >>        I should
> > > >>        collect the facets in this case?
> > > >>
> > > >>
> > > >>        Nicola.
> > > >>
> > > >>
> > > >>
> > > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > > >>> Just use MultiReader, it extends IndexReader, so you can
> > > >>        pass it anywhere where IndexReader can be passed.
> > > >>>
> > > >>> -----
> > > >>> Uwe Schindler
> > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >>> http://www.thetaphi.de
> > > >>> eMail: uwe@thetaphi.de
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > > >>>> To: java-user@lucene.apache.org
> > > >>>> Subject: FacetedSearch and MultiReader
> > > >>>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> I'm trying to develop faceted search using lucene 4.0
> > > >>        faceting framework.
> > > >>>> In our project we are searching on multiple indexes using
> > > >>        lucene
> > > >>>> MultiReader. How should we use the faceted framework to
> > > >>        obtain
> > > >>>> FacetResults starting from a MultiReader? all the example
> > > >>        I see are using a
> > > >>>> "single" IndexReader.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Nicola.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > >  ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail:
> > > >>        java-user-unsubscribe@lucene.apache.org
> > > >>>> For additional commands, e-mail:
> > > >>        java-user-help@lucene.apache.org
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>
> > >  ---------------------------------------------------------------------
> > > >>        To unsubscribe, e-mail:
> > > >>        java-user-unsubscribe@lucene.apache.org
> > > >>        For additional commands, e-mail:
> > > >>        java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> > > ---
> > > Denis Bazhenov <do...@gmail.com>
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Nicola Buso <nb...@ebi.ac.uk>.

Hi Shai,

I'd like just to give you a confirmation that your solution is working
after the tests I did.

Thanks again for the useful hints.


Nicola.

On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> Hi Nicola,
> 
> What I had in mind is something similar to this, which is possible starting
> with Lucene 4.1, due to changes done to facets (per-segment faceting):
> 
> DirTaxoWriter master = new DirTaxoWriter(masterDir);
> Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories
> and store in that array
> OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> OrdinalMap and store in that array
> 
> // now do the merge
> for (int i = 0; i < origTaxoDirs.length; i++) {
>   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> }
> 
> // now open your readers, and create the important map
> Map<AtomicReader,OrdinalMap) readerOrdinals = new
> HashMap<AtomicReader,OrdinalMap>();
> DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> for (int i = 0; i < origTaxoDirs.length; i++) {
>   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
>   OrdinalMap ordMap = ordinalMaps[i];
>   for (AtomicReaderContext ctx : r.leaves()) {
>     readerOrdinals.put(ctx.reader(), ordMap);
>   }
> }
> 
> MultiReader mr = new MultiReader(readers);
> 
> // create your FacetRequest (CountFacetRequest) with a custom Aggregator
> FacetRequest fr = new CountFacetRequest(cp, topK) {
>   @Override
>   public Aggregator createAggregator(...) {
>     return new OrdinalMappingAggregator() {
>       int[] ordMap;
> 
>       @Override
>       public void setNextReader(AtomicReaderContext context) {
>         ordMap = readerOrdinals.get(context.reader()).getMap();
>       }
> 
>       @Override
>       public void aggregate(int docID, float score, IntsRef ordinals) {
>         int upto = ordinals.offset + ordinals.length;
>         for (int i = ordinals.offset; i < upto; i++) {
>           int ordinal = ordinals[i]; // original ordinal read for the
> AtomicReader given to setNextReader
>           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following
> the taxonomy merge
>           counts[mappedOrdinal]++; // count the mapped ordinal instead, so
> all AtomicReaders count that ordinal
>         }
>       }
>     };
>   }
> }
> 
> While it may look like I wrote actual code to do it, I didn't :). So I
> guess it should work, but I haven't tried it.
> That way, you don't touch the content indexes at all, just the taxonomy
> ones.
> 
> Note however that you'll need to do this step every time the taxonomy index
> is updated, and you refresh the TaxoReader instance.
> Also, this will only work if all your indexes are opened in the same JVM
> (which I assume that's the case, since you use MultiReader).
> 
> If you still don't want to do that, then what Dennis wrote above is another
> way to do distributed faceted search, either inside the same JVM or across
> multiple JVMs.
> You obtain the FacetResult from each search and merge the results
> (unfortunately, there's still no tool in Lucene to do that for you).
> Just make sure to ask for a larger K, to ensure that the correct top-K is
> returned (see my previous notes).
> 
> Shai
> 
> 
> 
> 
> On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <do...@gmail.com> wrote:
> 
> > We have similar distribute search system and we have finished with the
> > following scheme. Search replicas (machines where index resides) are build
> > FacetResult's based on their index chunk (top N categories with document
> > counts). Later on the results are merged "by hands" with summing relevant
> > categories from different replicas.
> >
> > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> >
> > > Hi Shai,
> > >
> > > I was thinking to that too, but I'm indexing all indexes in a custom
> > > distributed environment than I can't in this moment have a single
> > > categories index for all the content indexes at indexing time.
> > > A solution should be to merge all the categories indexes in one only
> > > index and use your solution but the merge code I see in the examples
> > > merge also the content index and I can't do that.
> > >
> > > I should share the taxonomy if is possible to merge (I see the resulting
> > > categories indexes are not that big currently), but I would prefer to
> > > have a solution where I can collect the facets over multiple categories
> > > indexes in this way I will be sure the solution will scale better.
> > >
> > >
> > > Nicola.
> > >
> > >
> > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > >> Hi Nicola,
> > >>
> > >>
> > >> I think that what you're describing corresponds to distributed faceted
> > >> search. I.e., you have N content indexes, alongside N taxonomy
> > >> indexes.
> > >>
> > >> The information that's indexed in each of those sub-indexes does not
> > >> correlate with the other ones.
> > >> For example, say that you index the category "Movie/Drama", it may
> > >> receive ordinal 12 in index1 and 23 in index2.
> > >>
> > >> If you'll try to count ordinals using MultiReader, you'll just mess up
> > >> everything.
> > >>
> > >>
> > >> If you can share a single taxonomy index for all N content indexes,
> > >> then you'll be in a super-simple position:
> > >>
> > >> 1) Open one TaxonomyReader
> > >>
> > >> 2) Execute search with MultiReader and FacetsCollector
> > >>
> > >>
> > >>
> > >> It doesn't get simpler than that ! :)
> > >>
> > >>
> > >> Before I go into great length describing what you should do if you
> > >> cannot share the taxonomy, let me know if that's not an option for
> > >> you.
> > >>
> > >> Shai
> > >>
> > >>
> > >>
> > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> > >>        Thanks for the reply Uwe,
> > >>
> > >>        we currently can search with MultiReader over all the indexes
> > >>        we have.
> > >>        Now I want to add the faceting search, than I created a
> > >>        categories index
> > >>        for every index I currently have.
> > >>        To accumulate the faceted results now I have a MultiReader
> > >>        pointing all
> > >>        the indexes and I can create a TaxonomyReader for every
> > >>        categories index
> > >>        I have; all the way I see to obtain FacetResults are:
> > >>        1 - FacetsCollector
> > >>        2 - a FacetsAccumulator implementation
> > >>
> > >>        suppose I use the second option. I should:
> > >>        - search as usual using the MultiReader
> > >>        - than try to collect all the facetresults iterating over my
> > >>        TaxonomyReaders; at every iteration:
> > >>          - I create a FacetsAccumulator using the MultiReader and a
> > >>        TaxonomyReader
> > >>          - I get a list of FacetResult from the accumulator.
> > >>        - as I finish I should in some way merge all the
> > >>        List<FacetResult> I
> > >>        have.
> > >>
> > >>        I think this solution is not correct because the docsids from
> > >>        the search
> > >>        are pointing the multireader instead the taxonomyreader is
> > >>        pointing to
> > >>        the categories index of a single reader.
> > >>        I neither like to merge all the List of FacetResult I retrieve
> > >>        from the
> > >>        Accumulators.
> > >>
> > >>        Probably I'm missing something, can somebody clarify to me how
> > >>        I should
> > >>        collect the facets in this case?
> > >>
> > >>
> > >>        Nicola.
> > >>
> > >>
> > >>
> > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > >>> Just use MultiReader, it extends IndexReader, so you can
> > >>        pass it anywhere where IndexReader can be passed.
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: uwe@thetaphi.de
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > >>>> To: java-user@lucene.apache.org
> > >>>> Subject: FacetedSearch and MultiReader
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I'm trying to develop faceted search using lucene 4.0
> > >>        faceting framework.
> > >>>> In our project we are searching on multiple indexes using
> > >>        lucene
> > >>>> MultiReader. How should we use the faceted framework to
> > >>        obtain
> > >>>> FacetResults starting from a MultiReader? all the example
> > >>        I see are using a
> > >>>> "single" IndexReader.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Nicola.
> > >>>>
> > >>>>
> > >>>>
> > >>
> >  ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail:
> > >>        java-user-unsubscribe@lucene.apache.org
> > >>>> For additional commands, e-mail:
> > >>        java-user-help@lucene.apache.org
> > >>>
> > >>
> > >>
> > >>
> > >>
> >  ---------------------------------------------------------------------
> > >>        To unsubscribe, e-mail:
> > >>        java-user-unsubscribe@lucene.apache.org
> > >>        For additional commands, e-mail:
> > >>        java-user-help@lucene.apache.org
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> > ---
> > Denis Bazhenov <do...@gmail.com>
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Nicola Buso <nb...@ebi.ac.uk>.

Thanks Shai,

I'm trying your solution and it's working, I need to check some number
to test it.
As I said we are aware having big indexes, than I use facets only on
subsets, but if it will result in performances issues too than I'll for
sure take a look into facet sampling.


Nicola.

On Wed, 2013-01-23 at 13:13 +0200, Shai Erera wrote:
> >
> > I think we should open an issue to provide support for distributed
> > faceting?
> >
> 
> Opened https://issues.apache.org/jira/browse/LUCENE-4710.
> 
> BTW Nicola, I remember you said something about TBs of indexes. I just
> wanted to point out that if you have really large indexes, with many
> documents, then you may want to look at facets sampling. I.e., instead of
> working hard to get exact counts, you can sample the result set and get an
> approximation to the top-K categories. You can then choose to either 'fully
> count the approximated top-K', or stick w/ their partial counts and display
> pctg (%) to the user.
> 
> In fact, when the number of results is so big, think about the following
> result:
> 
> A (456,873,234)
>   A/1 (143,548,034)
>   A/1 (137,323,452)
> 
> These numbers are too big for a human to process the value behind them.
> Following the big numbers rule, these just denote "lots of results" to
> anyone.
> Rather, it may be better if it displayed A/1 (87%) and A/2 (85%).
> This is something you may want to consider too.
> 
> Sampling improves the performance of faceted search, especially on large
> result sets.
> Displaying % counts clarifies the returned top-K categories better, IMO, to
> the common user.
> 
> Shai
> 
> 
> On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
> 
> > On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <se...@gmail.com> wrote:
> >
> > > (unfortunately, there's still no tool in Lucene to do that for you).
> >
> > I think we should open an issue to provide support for distributed
> > faceting?
> >
> > For example, we already provide support for distributed searching
> > (TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems
> > like we should do the same for distributed faceting (even though its
> > somewhat tricky)?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Shai Erera <se...@gmail.com>.

>
> I think we should open an issue to provide support for distributed
> faceting?
>

Opened https://issues.apache.org/jira/browse/LUCENE-4710.

BTW Nicola, I remember you said something about TBs of indexes. I just
wanted to point out that if you have really large indexes, with many
documents, then you may want to look at facets sampling. I.e., instead of
working hard to get exact counts, you can sample the result set and get an
approximation to the top-K categories. You can then choose to either 'fully
count the approximated top-K', or stick w/ their partial counts and display
pctg (%) to the user.

In fact, when the number of results is so big, think about the following
result:

A (456,873,234)
  A/1 (143,548,034)
  A/1 (137,323,452)

These numbers are too big for a human to process the value behind them.
Following the big numbers rule, these just denote "lots of results" to
anyone.
Rather, it may be better if it displayed A/1 (87%) and A/2 (85%).
This is something you may want to consider too.

Sampling improves the performance of faceted search, especially on large
result sets.
Displaying % counts clarifies the returned top-K categories better, IMO, to
the common user.

Shai

On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <se...@gmail.com> wrote:
>
> > (unfortunately, there's still no tool in Lucene to do that for you).
>
> I think we should open an issue to provide support for distributed
> faceting?
>
> For example, we already provide support for distributed searching
> (TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems
> like we should do the same for distributed faceting (even though its
> somewhat tricky)?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FacetedSearch and MultiReader

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <se...@gmail.com> wrote:

> (unfortunately, there's still no tool in Lucene to do that for you).

I think we should open an issue to provide support for distributed faceting?

For example, we already provide support for distributed searching
(TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems
like we should do the same for distributed faceting (even though its
somewhat tricky)?

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Shai Erera <se...@gmail.com>.

Yes, the release is wrapping up. I believe that an announcement message
will be sent in the coming days.

Shai


On Tue, Jan 22, 2013 at 2:51 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> I will try it.
>
> I see there is already a lucene-4.1.0 package (dated 2013/01/21)
> available for download, do you know if this version will be released
> soon?
>
>
> Nicola.
>
> On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> > Hi Nicola,
> >
> > What I had in mind is something similar to this, which is possible
> starting
> > with Lucene 4.1, due to changes done to facets (per-segment faceting):
> >
> > DirTaxoWriter master = new DirTaxoWriter(masterDir);
> > Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open
> Directories
> > and store in that array
> > OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> > OrdinalMap and store in that array
> >
> > // now do the merge
> > for (int i = 0; i < origTaxoDirs.length; i++) {
> >   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> > }
> >
> > // now open your readers, and create the important map
> > Map<AtomicReader,OrdinalMap) readerOrdinals = new
> > HashMap<AtomicReader,OrdinalMap>();
> > DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> > for (int i = 0; i < origTaxoDirs.length; i++) {
> >   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
> >   OrdinalMap ordMap = ordinalMaps[i];
> >   for (AtomicReaderContext ctx : r.leaves()) {
> >     readerOrdinals.put(ctx.reader(), ordMap);
> >   }
> > }
> >
> > MultiReader mr = new MultiReader(readers);
> >
> > // create your FacetRequest (CountFacetRequest) with a custom Aggregator
> > FacetRequest fr = new CountFacetRequest(cp, topK) {
> >   @Override
> >   public Aggregator createAggregator(...) {
> >     return new OrdinalMappingAggregator() {
> >       int[] ordMap;
> >
> >       @Override
> >       public void setNextReader(AtomicReaderContext context) {
> >         ordMap = readerOrdinals.get(context.reader()).getMap();
> >       }
> >
> >       @Override
> >       public void aggregate(int docID, float score, IntsRef ordinals) {
> >         int upto = ordinals.offset + ordinals.length;
> >         for (int i = ordinals.offset; i < upto; i++) {
> >           int ordinal = ordinals[i]; // original ordinal read for the
> > AtomicReader given to setNextReader
> >           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal,
> following
> > the taxonomy merge
> >           counts[mappedOrdinal]++; // count the mapped ordinal instead,
> so
> > all AtomicReaders count that ordinal
> >         }
> >       }
> >     };
> >   }
> > }
> >
> > While it may look like I wrote actual code to do it, I didn't :). So I
> > guess it should work, but I haven't tried it.
> > That way, you don't touch the content indexes at all, just the taxonomy
> > ones.
> >
> > Note however that you'll need to do this step every time the taxonomy
> index
> > is updated, and you refresh the TaxoReader instance.
> > Also, this will only work if all your indexes are opened in the same JVM
> > (which I assume that's the case, since you use MultiReader).
> >
> > If you still don't want to do that, then what Dennis wrote above is
> another
> > way to do distributed faceted search, either inside the same JVM or
> across
> > multiple JVMs.
> > You obtain the FacetResult from each search and merge the results
> > (unfortunately, there's still no tool in Lucene to do that for you).
> > Just make sure to ask for a larger K, to ensure that the correct top-K is
> > returned (see my previous notes).
> >
> > Shai
> >
> >
> >
> >
> > On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <do...@gmail.com>
> wrote:
> >
> > > We have similar distribute search system and we have finished with the
> > > following scheme. Search replicas (machines where index resides) are
> build
> > > FacetResult's based on their index chunk (top N categories with
> document
> > > counts). Later on the results are merged "by hands" with summing
> relevant
> > > categories from different replicas.
> > >
> > > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> > >
> > > > Hi Shai,
> > > >
> > > > I was thinking to that too, but I'm indexing all indexes in a custom
> > > > distributed environment than I can't in this moment have a single
> > > > categories index for all the content indexes at indexing time.
> > > > A solution should be to merge all the categories indexes in one only
> > > > index and use your solution but the merge code I see in the examples
> > > > merge also the content index and I can't do that.
> > > >
> > > > I should share the taxonomy if is possible to merge (I see the
> resulting
> > > > categories indexes are not that big currently), but I would prefer to
> > > > have a solution where I can collect the facets over multiple
> categories
> > > > indexes in this way I will be sure the solution will scale better.
> > > >
> > > >
> > > > Nicola.
> > > >
> > > >
> > > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > > >> Hi Nicola,
> > > >>
> > > >>
> > > >> I think that what you're describing corresponds to distributed
> faceted
> > > >> search. I.e., you have N content indexes, alongside N taxonomy
> > > >> indexes.
> > > >>
> > > >> The information that's indexed in each of those sub-indexes does not
> > > >> correlate with the other ones.
> > > >> For example, say that you index the category "Movie/Drama", it may
> > > >> receive ordinal 12 in index1 and 23 in index2.
> > > >>
> > > >> If you'll try to count ordinals using MultiReader, you'll just mess
> up
> > > >> everything.
> > > >>
> > > >>
> > > >> If you can share a single taxonomy index for all N content indexes,
> > > >> then you'll be in a super-simple position:
> > > >>
> > > >> 1) Open one TaxonomyReader
> > > >>
> > > >> 2) Execute search with MultiReader and FacetsCollector
> > > >>
> > > >>
> > > >>
> > > >> It doesn't get simpler than that ! :)
> > > >>
> > > >>
> > > >> Before I go into great length describing what you should do if you
> > > >> cannot share the taxonomy, let me know if that's not an option for
> > > >> you.
> > > >>
> > > >> Shai
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk>
> wrote:
> > > >>        Thanks for the reply Uwe,
> > > >>
> > > >>        we currently can search with MultiReader over all the indexes
> > > >>        we have.
> > > >>        Now I want to add the faceting search, than I created a
> > > >>        categories index
> > > >>        for every index I currently have.
> > > >>        To accumulate the faceted results now I have a MultiReader
> > > >>        pointing all
> > > >>        the indexes and I can create a TaxonomyReader for every
> > > >>        categories index
> > > >>        I have; all the way I see to obtain FacetResults are:
> > > >>        1 - FacetsCollector
> > > >>        2 - a FacetsAccumulator implementation
> > > >>
> > > >>        suppose I use the second option. I should:
> > > >>        - search as usual using the MultiReader
> > > >>        - than try to collect all the facetresults iterating over my
> > > >>        TaxonomyReaders; at every iteration:
> > > >>          - I create a FacetsAccumulator using the MultiReader and a
> > > >>        TaxonomyReader
> > > >>          - I get a list of FacetResult from the accumulator.
> > > >>        - as I finish I should in some way merge all the
> > > >>        List<FacetResult> I
> > > >>        have.
> > > >>
> > > >>        I think this solution is not correct because the docsids from
> > > >>        the search
> > > >>        are pointing the multireader instead the taxonomyreader is
> > > >>        pointing to
> > > >>        the categories index of a single reader.
> > > >>        I neither like to merge all the List of FacetResult I
> retrieve
> > > >>        from the
> > > >>        Accumulators.
> > > >>
> > > >>        Probably I'm missing something, can somebody clarify to me
> how
> > > >>        I should
> > > >>        collect the facets in this case?
> > > >>
> > > >>
> > > >>        Nicola.
> > > >>
> > > >>
> > > >>
> > > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > > >>> Just use MultiReader, it extends IndexReader, so you can
> > > >>        pass it anywhere where IndexReader can be passed.
> > > >>>
> > > >>> -----
> > > >>> Uwe Schindler
> > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >>> http://www.thetaphi.de
> > > >>> eMail: uwe@thetaphi.de
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > > >>>> To: java-user@lucene.apache.org
> > > >>>> Subject: FacetedSearch and MultiReader
> > > >>>>
> > > >>>> Hi all,
> > > >>>>
> > > >>>> I'm trying to develop faceted search using lucene 4.0
> > > >>        faceting framework.
> > > >>>> In our project we are searching on multiple indexes using
> > > >>        lucene
> > > >>>> MultiReader. How should we use the faceted framework to
> > > >>        obtain
> > > >>>> FacetResults starting from a MultiReader? all the example
> > > >>        I see are using a
> > > >>>> "single" IndexReader.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Nicola.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > >  ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail:
> > > >>        java-user-unsubscribe@lucene.apache.org
> > > >>>> For additional commands, e-mail:
> > > >>        java-user-help@lucene.apache.org
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>
> > >  ---------------------------------------------------------------------
> > > >>        To unsubscribe, e-mail:
> > > >>        java-user-unsubscribe@lucene.apache.org
> > > >>        For additional commands, e-mail:
> > > >>        java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> > > ---
> > > Denis Bazhenov <do...@gmail.com>
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FacetedSearch and MultiReader

Posted by Nicola Buso <nb...@ebi.ac.uk>.

I will try it.

I see there is already a lucene-4.1.0 package (dated 2013/01/21)
available for download, do you know if this version will be released
soon?


Nicola.

On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> Hi Nicola,
> 
> What I had in mind is something similar to this, which is possible starting
> with Lucene 4.1, due to changes done to facets (per-segment faceting):
> 
> DirTaxoWriter master = new DirTaxoWriter(masterDir);
> Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories
> and store in that array
> OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> OrdinalMap and store in that array
> 
> // now do the merge
> for (int i = 0; i < origTaxoDirs.length; i++) {
>   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> }
> 
> // now open your readers, and create the important map
> Map<AtomicReader,OrdinalMap) readerOrdinals = new
> HashMap<AtomicReader,OrdinalMap>();
> DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> for (int i = 0; i < origTaxoDirs.length; i++) {
>   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
>   OrdinalMap ordMap = ordinalMaps[i];
>   for (AtomicReaderContext ctx : r.leaves()) {
>     readerOrdinals.put(ctx.reader(), ordMap);
>   }
> }
> 
> MultiReader mr = new MultiReader(readers);
> 
> // create your FacetRequest (CountFacetRequest) with a custom Aggregator
> FacetRequest fr = new CountFacetRequest(cp, topK) {
>   @Override
>   public Aggregator createAggregator(...) {
>     return new OrdinalMappingAggregator() {
>       int[] ordMap;
> 
>       @Override
>       public void setNextReader(AtomicReaderContext context) {
>         ordMap = readerOrdinals.get(context.reader()).getMap();
>       }
> 
>       @Override
>       public void aggregate(int docID, float score, IntsRef ordinals) {
>         int upto = ordinals.offset + ordinals.length;
>         for (int i = ordinals.offset; i < upto; i++) {
>           int ordinal = ordinals[i]; // original ordinal read for the
> AtomicReader given to setNextReader
>           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following
> the taxonomy merge
>           counts[mappedOrdinal]++; // count the mapped ordinal instead, so
> all AtomicReaders count that ordinal
>         }
>       }
>     };
>   }
> }
> 
> While it may look like I wrote actual code to do it, I didn't :). So I
> guess it should work, but I haven't tried it.
> That way, you don't touch the content indexes at all, just the taxonomy
> ones.
> 
> Note however that you'll need to do this step every time the taxonomy index
> is updated, and you refresh the TaxoReader instance.
> Also, this will only work if all your indexes are opened in the same JVM
> (which I assume that's the case, since you use MultiReader).
> 
> If you still don't want to do that, then what Dennis wrote above is another
> way to do distributed faceted search, either inside the same JVM or across
> multiple JVMs.
> You obtain the FacetResult from each search and merge the results
> (unfortunately, there's still no tool in Lucene to do that for you).
> Just make sure to ask for a larger K, to ensure that the correct top-K is
> returned (see my previous notes).
> 
> Shai
> 
> 
> 
> 
> On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <do...@gmail.com> wrote:
> 
> > We have similar distribute search system and we have finished with the
> > following scheme. Search replicas (machines where index resides) are build
> > FacetResult's based on their index chunk (top N categories with document
> > counts). Later on the results are merged "by hands" with summing relevant
> > categories from different replicas.
> >
> > On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> >
> > > Hi Shai,
> > >
> > > I was thinking to that too, but I'm indexing all indexes in a custom
> > > distributed environment than I can't in this moment have a single
> > > categories index for all the content indexes at indexing time.
> > > A solution should be to merge all the categories indexes in one only
> > > index and use your solution but the merge code I see in the examples
> > > merge also the content index and I can't do that.
> > >
> > > I should share the taxonomy if is possible to merge (I see the resulting
> > > categories indexes are not that big currently), but I would prefer to
> > > have a solution where I can collect the facets over multiple categories
> > > indexes in this way I will be sure the solution will scale better.
> > >
> > >
> > > Nicola.
> > >
> > >
> > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > >> Hi Nicola,
> > >>
> > >>
> > >> I think that what you're describing corresponds to distributed faceted
> > >> search. I.e., you have N content indexes, alongside N taxonomy
> > >> indexes.
> > >>
> > >> The information that's indexed in each of those sub-indexes does not
> > >> correlate with the other ones.
> > >> For example, say that you index the category "Movie/Drama", it may
> > >> receive ordinal 12 in index1 and 23 in index2.
> > >>
> > >> If you'll try to count ordinals using MultiReader, you'll just mess up
> > >> everything.
> > >>
> > >>
> > >> If you can share a single taxonomy index for all N content indexes,
> > >> then you'll be in a super-simple position:
> > >>
> > >> 1) Open one TaxonomyReader
> > >>
> > >> 2) Execute search with MultiReader and FacetsCollector
> > >>
> > >>
> > >>
> > >> It doesn't get simpler than that ! :)
> > >>
> > >>
> > >> Before I go into great length describing what you should do if you
> > >> cannot share the taxonomy, let me know if that's not an option for
> > >> you.
> > >>
> > >> Shai
> > >>
> > >>
> > >>
> > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> > >>        Thanks for the reply Uwe,
> > >>
> > >>        we currently can search with MultiReader over all the indexes
> > >>        we have.
> > >>        Now I want to add the faceting search, than I created a
> > >>        categories index
> > >>        for every index I currently have.
> > >>        To accumulate the faceted results now I have a MultiReader
> > >>        pointing all
> > >>        the indexes and I can create a TaxonomyReader for every
> > >>        categories index
> > >>        I have; all the way I see to obtain FacetResults are:
> > >>        1 - FacetsCollector
> > >>        2 - a FacetsAccumulator implementation
> > >>
> > >>        suppose I use the second option. I should:
> > >>        - search as usual using the MultiReader
> > >>        - than try to collect all the facetresults iterating over my
> > >>        TaxonomyReaders; at every iteration:
> > >>          - I create a FacetsAccumulator using the MultiReader and a
> > >>        TaxonomyReader
> > >>          - I get a list of FacetResult from the accumulator.
> > >>        - as I finish I should in some way merge all the
> > >>        List<FacetResult> I
> > >>        have.
> > >>
> > >>        I think this solution is not correct because the docsids from
> > >>        the search
> > >>        are pointing the multireader instead the taxonomyreader is
> > >>        pointing to
> > >>        the categories index of a single reader.
> > >>        I neither like to merge all the List of FacetResult I retrieve
> > >>        from the
> > >>        Accumulators.
> > >>
> > >>        Probably I'm missing something, can somebody clarify to me how
> > >>        I should
> > >>        collect the facets in this case?
> > >>
> > >>
> > >>        Nicola.
> > >>
> > >>
> > >>
> > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > >>> Just use MultiReader, it extends IndexReader, so you can
> > >>        pass it anywhere where IndexReader can be passed.
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: uwe@thetaphi.de
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > >>>> To: java-user@lucene.apache.org
> > >>>> Subject: FacetedSearch and MultiReader
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I'm trying to develop faceted search using lucene 4.0
> > >>        faceting framework.
> > >>>> In our project we are searching on multiple indexes using
> > >>        lucene
> > >>>> MultiReader. How should we use the faceted framework to
> > >>        obtain
> > >>>> FacetResults starting from a MultiReader? all the example
> > >>        I see are using a
> > >>>> "single" IndexReader.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Nicola.
> > >>>>
> > >>>>
> > >>>>
> > >>
> >  ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail:
> > >>        java-user-unsubscribe@lucene.apache.org
> > >>>> For additional commands, e-mail:
> > >>        java-user-help@lucene.apache.org
> > >>>
> > >>
> > >>
> > >>
> > >>
> >  ---------------------------------------------------------------------
> > >>        To unsubscribe, e-mail:
> > >>        java-user-unsubscribe@lucene.apache.org
> > >>        For additional commands, e-mail:
> > >>        java-user-help@lucene.apache.org
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> > ---
> > Denis Bazhenov <do...@gmail.com>
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Shai Erera <se...@gmail.com>.

Hi Nicola,

What I had in mind is something similar to this, which is possible starting
with Lucene 4.1, due to changes done to facets (per-segment faceting):

DirTaxoWriter master = new DirTaxoWriter(masterDir);
Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories
and store in that array
OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
OrdinalMap and store in that array

// now do the merge
for (int i = 0; i < origTaxoDirs.length; i++) {
  master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
}

// now open your readers, and create the important map
Map<AtomicReader,OrdinalMap) readerOrdinals = new
HashMap<AtomicReader,OrdinalMap>();
DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
for (int i = 0; i < origTaxoDirs.length; i++) {
  DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
  OrdinalMap ordMap = ordinalMaps[i];
  for (AtomicReaderContext ctx : r.leaves()) {
    readerOrdinals.put(ctx.reader(), ordMap);
  }
}

MultiReader mr = new MultiReader(readers);

// create your FacetRequest (CountFacetRequest) with a custom Aggregator
FacetRequest fr = new CountFacetRequest(cp, topK) {
  @Override
  public Aggregator createAggregator(...) {
    return new OrdinalMappingAggregator() {
      int[] ordMap;

      @Override
      public void setNextReader(AtomicReaderContext context) {
        ordMap = readerOrdinals.get(context.reader()).getMap();
      }

      @Override
      public void aggregate(int docID, float score, IntsRef ordinals) {
        int upto = ordinals.offset + ordinals.length;
        for (int i = ordinals.offset; i < upto; i++) {
          int ordinal = ordinals[i]; // original ordinal read for the
AtomicReader given to setNextReader
          int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following
the taxonomy merge
          counts[mappedOrdinal]++; // count the mapped ordinal instead, so
all AtomicReaders count that ordinal
        }
      }
    };
  }
}

While it may look like I wrote actual code to do it, I didn't :). So I
guess it should work, but I haven't tried it.
That way, you don't touch the content indexes at all, just the taxonomy
ones.

Note however that you'll need to do this step every time the taxonomy index
is updated, and you refresh the TaxoReader instance.
Also, this will only work if all your indexes are opened in the same JVM
(which I assume that's the case, since you use MultiReader).

If you still don't want to do that, then what Dennis wrote above is another
way to do distributed faceted search, either inside the same JVM or across
multiple JVMs.
You obtain the FacetResult from each search and merge the results
(unfortunately, there's still no tool in Lucene to do that for you).
Just make sure to ask for a larger K, to ensure that the correct top-K is
returned (see my previous notes).

Shai




On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <do...@gmail.com> wrote:

> We have similar distribute search system and we have finished with the
> following scheme. Search replicas (machines where index resides) are build
> FacetResult's based on their index chunk (top N categories with document
> counts). Later on the results are merged "by hands" with summing relevant
> categories from different replicas.
>
> On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>
> > Hi Shai,
> >
> > I was thinking to that too, but I'm indexing all indexes in a custom
> > distributed environment than I can't in this moment have a single
> > categories index for all the content indexes at indexing time.
> > A solution should be to merge all the categories indexes in one only
> > index and use your solution but the merge code I see in the examples
> > merge also the content index and I can't do that.
> >
> > I should share the taxonomy if is possible to merge (I see the resulting
> > categories indexes are not that big currently), but I would prefer to
> > have a solution where I can collect the facets over multiple categories
> > indexes in this way I will be sure the solution will scale better.
> >
> >
> > Nicola.
> >
> >
> > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> >> Hi Nicola,
> >>
> >>
> >> I think that what you're describing corresponds to distributed faceted
> >> search. I.e., you have N content indexes, alongside N taxonomy
> >> indexes.
> >>
> >> The information that's indexed in each of those sub-indexes does not
> >> correlate with the other ones.
> >> For example, say that you index the category "Movie/Drama", it may
> >> receive ordinal 12 in index1 and 23 in index2.
> >>
> >> If you'll try to count ordinals using MultiReader, you'll just mess up
> >> everything.
> >>
> >>
> >> If you can share a single taxonomy index for all N content indexes,
> >> then you'll be in a super-simple position:
> >>
> >> 1) Open one TaxonomyReader
> >>
> >> 2) Execute search with MultiReader and FacetsCollector
> >>
> >>
> >>
> >> It doesn't get simpler than that ! :)
> >>
> >>
> >> Before I go into great length describing what you should do if you
> >> cannot share the taxonomy, let me know if that's not an option for
> >> you.
> >>
> >> Shai
> >>
> >>
> >>
> >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
> >>        Thanks for the reply Uwe,
> >>
> >>        we currently can search with MultiReader over all the indexes
> >>        we have.
> >>        Now I want to add the faceting search, than I created a
> >>        categories index
> >>        for every index I currently have.
> >>        To accumulate the faceted results now I have a MultiReader
> >>        pointing all
> >>        the indexes and I can create a TaxonomyReader for every
> >>        categories index
> >>        I have; all the way I see to obtain FacetResults are:
> >>        1 - FacetsCollector
> >>        2 - a FacetsAccumulator implementation
> >>
> >>        suppose I use the second option. I should:
> >>        - search as usual using the MultiReader
> >>        - than try to collect all the facetresults iterating over my
> >>        TaxonomyReaders; at every iteration:
> >>          - I create a FacetsAccumulator using the MultiReader and a
> >>        TaxonomyReader
> >>          - I get a list of FacetResult from the accumulator.
> >>        - as I finish I should in some way merge all the
> >>        List<FacetResult> I
> >>        have.
> >>
> >>        I think this solution is not correct because the docsids from
> >>        the search
> >>        are pointing the multireader instead the taxonomyreader is
> >>        pointing to
> >>        the categories index of a single reader.
> >>        I neither like to merge all the List of FacetResult I retrieve
> >>        from the
> >>        Accumulators.
> >>
> >>        Probably I'm missing something, can somebody clarify to me how
> >>        I should
> >>        collect the facets in this case?
> >>
> >>
> >>        Nicola.
> >>
> >>
> >>
> >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> >>> Just use MultiReader, it extends IndexReader, so you can
> >>        pass it anywhere where IndexReader can be passed.
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >>>
> >>>> -----Original Message-----
> >>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> >>>> Sent: Monday, January 21, 2013 3:59 PM
> >>>> To: java-user@lucene.apache.org
> >>>> Subject: FacetedSearch and MultiReader
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I'm trying to develop faceted search using lucene 4.0
> >>        faceting framework.
> >>>> In our project we are searching on multiple indexes using
> >>        lucene
> >>>> MultiReader. How should we use the faceted framework to
> >>        obtain
> >>>> FacetResults starting from a MultiReader? all the example
> >>        I see are using a
> >>>> "single" IndexReader.
> >>>>
> >>>>
> >>>>
> >>>> Nicola.
> >>>>
> >>>>
> >>>>
> >>
>  ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail:
> >>        java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail:
> >>        java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >>
> >>
>  ---------------------------------------------------------------------
> >>        To unsubscribe, e-mail:
> >>        java-user-unsubscribe@lucene.apache.org
> >>        For additional commands, e-mail:
> >>        java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---
> Denis Bazhenov <do...@gmail.com>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FacetedSearch and MultiReader

Posted by Denis Bazhenov <do...@gmail.com>.

We have similar distribute search system and we have finished with the following scheme. Search replicas (machines where index resides) are build FacetResult's based on their index chunk (top N categories with document counts). Later on the results are merged "by hands" with summing relevant categories from different replicas.

On Jan 22, 2013, at 3:08 AM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi Shai,
> 
> I was thinking to that too, but I'm indexing all indexes in a custom
> distributed environment than I can't in this moment have a single
> categories index for all the content indexes at indexing time.
> A solution should be to merge all the categories indexes in one only
> index and use your solution but the merge code I see in the examples
> merge also the content index and I can't do that.
> 
> I should share the taxonomy if is possible to merge (I see the resulting
> categories indexes are not that big currently), but I would prefer to
> have a solution where I can collect the facets over multiple categories
> indexes in this way I will be sure the solution will scale better.
> 
> 
> Nicola.
> 
> 
> On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
>> Hi Nicola,
>> 
>> 
>> I think that what you're describing corresponds to distributed faceted
>> search. I.e., you have N content indexes, alongside N taxonomy
>> indexes.
>> 
>> The information that's indexed in each of those sub-indexes does not
>> correlate with the other ones.
>> For example, say that you index the category "Movie/Drama", it may
>> receive ordinal 12 in index1 and 23 in index2.
>> 
>> If you'll try to count ordinals using MultiReader, you'll just mess up
>> everything.
>> 
>> 
>> If you can share a single taxonomy index for all N content indexes,
>> then you'll be in a super-simple position:
>> 
>> 1) Open one TaxonomyReader
>> 
>> 2) Execute search with MultiReader and FacetsCollector
>> 
>> 
>> 
>> It doesn't get simpler than that ! :)
>> 
>> 
>> Before I go into great length describing what you should do if you
>> cannot share the taxonomy, let me know if that's not an option for
>> you.
>> 
>> Shai
>> 
>> 
>> 
>> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>>        Thanks for the reply Uwe,
>> 
>>        we currently can search with MultiReader over all the indexes
>>        we have.
>>        Now I want to add the faceting search, than I created a
>>        categories index
>>        for every index I currently have.
>>        To accumulate the faceted results now I have a MultiReader
>>        pointing all
>>        the indexes and I can create a TaxonomyReader for every
>>        categories index
>>        I have; all the way I see to obtain FacetResults are:
>>        1 - FacetsCollector
>>        2 - a FacetsAccumulator implementation
>> 
>>        suppose I use the second option. I should:
>>        - search as usual using the MultiReader
>>        - than try to collect all the facetresults iterating over my
>>        TaxonomyReaders; at every iteration:
>>          - I create a FacetsAccumulator using the MultiReader and a
>>        TaxonomyReader
>>          - I get a list of FacetResult from the accumulator.
>>        - as I finish I should in some way merge all the
>>        List<FacetResult> I
>>        have.
>> 
>>        I think this solution is not correct because the docsids from
>>        the search
>>        are pointing the multireader instead the taxonomyreader is
>>        pointing to
>>        the categories index of a single reader.
>>        I neither like to merge all the List of FacetResult I retrieve
>>        from the
>>        Accumulators.
>> 
>>        Probably I'm missing something, can somebody clarify to me how
>>        I should
>>        collect the facets in this case?
>> 
>> 
>>        Nicola.
>> 
>> 
>> 
>>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
>>> Just use MultiReader, it extends IndexReader, so you can
>>        pass it anywhere where IndexReader can be passed.
>>> 
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>> 
>>>> -----Original Message-----
>>>> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
>>>> Sent: Monday, January 21, 2013 3:59 PM
>>>> To: java-user@lucene.apache.org
>>>> Subject: FacetedSearch and MultiReader
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm trying to develop faceted search using lucene 4.0
>>        faceting framework.
>>>> In our project we are searching on multiple indexes using
>>        lucene
>>>> MultiReader. How should we use the faceted framework to
>>        obtain
>>>> FacetResults starting from a MultiReader? all the example
>>        I see are using a
>>>> "single" IndexReader.
>>>> 
>>>> 
>>>> 
>>>> Nicola.
>>>> 
>>>> 
>>>> 
>>        ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>        java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail:
>>        java-user-help@lucene.apache.org
>>> 
>> 
>> 
>> 
>>        ---------------------------------------------------------------------
>>        To unsubscribe, e-mail:
>>        java-user-unsubscribe@lucene.apache.org
>>        For additional commands, e-mail:
>>        java-user-help@lucene.apache.org
>> 
>> 
>> 
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---
Denis Bazhenov <do...@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Nicola Buso <nb...@ebi.ac.uk>.

Hi Shai,

I was thinking to that too, but I'm indexing all indexes in a custom
distributed environment than I can't in this moment have a single
categories index for all the content indexes at indexing time.
A solution should be to merge all the categories indexes in one only
index and use your solution but the merge code I see in the examples
merge also the content index and I can't do that.

I should share the taxonomy if is possible to merge (I see the resulting
categories indexes are not that big currently), but I would prefer to
have a solution where I can collect the facets over multiple categories
indexes in this way I will be sure the solution will scale better.


Nicola.


On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> Hi Nicola,
> 
> 
> I think that what you're describing corresponds to distributed faceted
> search. I.e., you have N content indexes, alongside N taxonomy
> indexes.
> 
> The information that's indexed in each of those sub-indexes does not
> correlate with the other ones.
> For example, say that you index the category "Movie/Drama", it may
> receive ordinal 12 in index1 and 23 in index2.
> 
> If you'll try to count ordinals using MultiReader, you'll just mess up
> everything.
> 
> 
> If you can share a single taxonomy index for all N content indexes,
> then you'll be in a super-simple position:
> 
> 1) Open one TaxonomyReader
> 
> 2) Execute search with MultiReader and FacetsCollector
> 
> 
> 
> It doesn't get simpler than that ! :)
> 
> 
> Before I go into great length describing what you should do if you
> cannot share the taxonomy, let me know if that's not an option for
> you.
> 
> Shai
> 
> 
> 
> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:
>         Thanks for the reply Uwe,
>         
>         we currently can search with MultiReader over all the indexes
>         we have.
>         Now I want to add the faceting search, than I created a
>         categories index
>         for every index I currently have.
>         To accumulate the faceted results now I have a MultiReader
>         pointing all
>         the indexes and I can create a TaxonomyReader for every
>         categories index
>         I have; all the way I see to obtain FacetResults are:
>         1 - FacetsCollector
>         2 - a FacetsAccumulator implementation
>         
>         suppose I use the second option. I should:
>         - search as usual using the MultiReader
>         - than try to collect all the facetresults iterating over my
>         TaxonomyReaders; at every iteration:
>           - I create a FacetsAccumulator using the MultiReader and a
>         TaxonomyReader
>           - I get a list of FacetResult from the accumulator.
>         - as I finish I should in some way merge all the
>         List<FacetResult> I
>         have.
>         
>         I think this solution is not correct because the docsids from
>         the search
>         are pointing the multireader instead the taxonomyreader is
>         pointing to
>         the categories index of a single reader.
>         I neither like to merge all the List of FacetResult I retrieve
>         from the
>         Accumulators.
>         
>         Probably I'm missing something, can somebody clarify to me how
>         I should
>         collect the facets in this case?
>         
>         
>         Nicola.
>         
>         
>         
>         On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
>         > Just use MultiReader, it extends IndexReader, so you can
>         pass it anywhere where IndexReader can be passed.
>         >
>         > -----
>         > Uwe Schindler
>         > H.-H.-Meier-Allee 63, D-28213 Bremen
>         > http://www.thetaphi.de
>         > eMail: uwe@thetaphi.de
>         >
>         > > -----Original Message-----
>         > > From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
>         > > Sent: Monday, January 21, 2013 3:59 PM
>         > > To: java-user@lucene.apache.org
>         > > Subject: FacetedSearch and MultiReader
>         > >
>         > > Hi all,
>         > >
>         > > I'm trying to develop faceted search using lucene 4.0
>         faceting framework.
>         > > In our project we are searching on multiple indexes using
>         lucene
>         > > MultiReader. How should we use the faceted framework to
>         obtain
>         > > FacetResults starting from a MultiReader? all the example
>         I see are using a
>         > > "single" IndexReader.
>         > >
>         > >
>         > >
>         > > Nicola.
>         > >
>         > >
>         > >
>         ---------------------------------------------------------------------
>         > > To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         > > For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         >
>         
>         
>         
>         ---------------------------------------------------------------------
>         To unsubscribe, e-mail:
>         java-user-unsubscribe@lucene.apache.org
>         For additional commands, e-mail:
>         java-user-help@lucene.apache.org
>         
>         
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FacetedSearch and MultiReader

Posted by Shai Erera <se...@gmail.com>.

Hi Nicola,

I think that what you're describing corresponds to distributed faceted
search. I.e., you have N content indexes, alongside N taxonomy indexes.
The information that's indexed in each of those sub-indexes does not
correlate with the other ones.
For example, say that you index the category "Movie/Drama", it may receive
ordinal 12 in index1 and 23 in index2.
If you'll try to count ordinals using MultiReader, you'll just mess up
everything.

If you can share a single taxonomy index for all N content indexes, then
you'll be in a super-simple position:
1) Open one TaxonomyReader
2) Execute search with MultiReader and FacetsCollector

It doesn't get simpler than that ! :)

Before I go into great length describing what you should do if you cannot
share the taxonomy, let me know if that's not an option for you.

Shai


On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <nb...@ebi.ac.uk> wrote:

> Thanks for the reply Uwe,
>
> we currently can search with MultiReader over all the indexes we have.
> Now I want to add the faceting search, than I created a categories index
> for every index I currently have.
> To accumulate the faceted results now I have a MultiReader pointing all
> the indexes and I can create a TaxonomyReader for every categories index
> I have; all the way I see to obtain FacetResults are:
> 1 - FacetsCollector
> 2 - a FacetsAccumulator implementation
>
> suppose I use the second option. I should:
> - search as usual using the MultiReader
> - than try to collect all the facetresults iterating over my
> TaxonomyReaders; at every iteration:
>   - I create a FacetsAccumulator using the MultiReader and a
> TaxonomyReader
>   - I get a list of FacetResult from the accumulator.
> - as I finish I should in some way merge all the List<FacetResult> I
> have.
>
> I think this solution is not correct because the docsids from the search
> are pointing the multireader instead the taxonomyreader is pointing to
> the categories index of a single reader.
> I neither like to merge all the List of FacetResult I retrieve from the
> Accumulators.
>
> Probably I'm missing something, can somebody clarify to me how I should
> collect the facets in this case?
>
>
> Nicola.
>
>
>
> On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > Just use MultiReader, it extends IndexReader, so you can pass it
> anywhere where IndexReader can be passed.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > > Sent: Monday, January 21, 2013 3:59 PM
> > > To: java-user@lucene.apache.org
> > > Subject: FacetedSearch and MultiReader
> > >
> > > Hi all,
> > >
> > > I'm trying to develop faceted search using lucene 4.0 faceting
> framework.
> > > In our project we are searching on multiple indexes using lucene
> > > MultiReader. How should we use the faceted framework to obtain
> > > FacetResults starting from a MultiReader? all the example I see are
> using a
> > > "single" IndexReader.
> > >
> > >
> > >
> > > Nicola.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FacetedSearch and MultiReader

Posted by Nicola Buso <nb...@ebi.ac.uk>.

Thanks for the reply Uwe,

we currently can search with MultiReader over all the indexes we have.
Now I want to add the faceting search, than I created a categories index
for every index I currently have.
To accumulate the faceted results now I have a MultiReader pointing all
the indexes and I can create a TaxonomyReader for every categories index
I have; all the way I see to obtain FacetResults are:
1 - FacetsCollector
2 - a FacetsAccumulator implementation

suppose I use the second option. I should:
- search as usual using the MultiReader
- than try to collect all the facetresults iterating over my
TaxonomyReaders; at every iteration:
  - I create a FacetsAccumulator using the MultiReader and a
TaxonomyReader
  - I get a list of FacetResult from the accumulator.
- as I finish I should in some way merge all the List<FacetResult> I
have.

I think this solution is not correct because the docsids from the search
are pointing the multireader instead the taxonomyreader is pointing to
the categories index of a single reader.
I neither like to merge all the List of FacetResult I retrieve from the
Accumulators.

Probably I'm missing something, can somebody clarify to me how I should
collect the facets in this case?

Nicola.

On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> Just use MultiReader, it extends IndexReader, so you can pass it anywhere where IndexReader can be passed.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > Sent: Monday, January 21, 2013 3:59 PM
> > To: java-user@lucene.apache.org
> > Subject: FacetedSearch and MultiReader
> > 
> > Hi all,
> > 
> > I'm trying to develop faceted search using lucene 4.0 faceting framework.
> > In our project we are searching on multiple indexes using lucene
> > MultiReader. How should we use the faceted framework to obtain
> > FacetResults starting from a MultiReader? all the example I see are using a
> > "single" IndexReader.
> > 
> > 
> > 
> > Nicola.
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: FacetedSearch and MultiReader

Posted by Uwe Schindler <uw...@thetaphi.de>.

Just use MultiReader, it extends IndexReader, so you can pass it anywhere where IndexReader can be passed.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> Sent: Monday, January 21, 2013 3:59 PM
> To: java-user@lucene.apache.org
> Subject: FacetedSearch and MultiReader
> 
> Hi all,
> 
> I'm trying to develop faceted search using lucene 4.0 faceting framework.
> In our project we are searching on multiple indexes using lucene
> MultiReader. How should we use the faceted framework to obtain
> FacetResults starting from a MultiReader? all the example I see are using a
> "single" IndexReader.
> 
> 
> 
> Nicola.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org