You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sheng <sh...@gmail.com> on 2014/08/12 23:35:27 UTC

Questions for facets search

I actually have 2 questions:

1. Is it possible to get the facet label for a particular document? The
reason we want this is we'd like to allow users to see tags for each hit in
addition to the taxonomy for his/her search.

2. Is it possible to re-index the facet cache without reindexing the whole
lucene cache, since they are separated? We have a dynamic list of faceted
fields, being able to quickly rebuild the whole facet lucene cache would be
quite desirable.

Again, I am using lucene 4.7, thanks in advance to your answers!

Sheng

Re: Questions for facets search

Posted by Shai Erera <se...@gmail.com>.
Glad it helped Sheng.

Note, the taxonomy index is not exactly like what you implement, just want
to clarify that. You implemented something like a JOIN between two indexes,
where a document index Index1 can be joined with a document (or set of
docs) in Index2, by some primary key.

The taxonomy index is different. It's an auxiliary index, but the word
'index' is just an implementation detail. Again, think of it as a large Map
from a String to Integer. Every facet in the taxonomy gets a unique ID
(integer), and that integer is encoded in the search index for all
documents that are associated with that facet.

Lucene implements a similar feature, per-segment, through
SortedSetDocValues (and the facet module supports that one too, without the
need for an auxiliary index). The difference is that SortedSetDocValues
implement that mapping per-segment, so e.g. the facet Tags/Lucene may
receive the integer 5 in seg1 and 12 in seg2, where the taxonomy index maps
it *once* to an integer (say 4), and that integer is encoded in a
BinaryDocValuesField in all segments of the search index.

The only lookup that is done at search time is when you want to label top
facets. Since the search index holds only the integer values of the facets,
the taxonomy index is used to label them (so now it's more of a
bidirectional Map).

Just wanted to clarify the differences.

Shai


On Thu, Aug 14, 2014 at 2:56 AM, Sheng <sh...@gmail.com> wrote:

> Shai,
>
> Thanks a lot for your answers! Sorry, I was distracted by some other
> matters during the day and cannot try your suggestions until now. So what
> you suggest on 1 is working like a charm :) for 2, it is a pity but I can
> understand. By the way, the way you described that facet index gets stored
> like a map is quite similar to how we store the payload :) We use an
> integer as payload for each token, and store more complicated information
> in another Lucene index with the integer payload as the key for each
> document.
>
> Sheng
>
> On Wednesday, August 13, 2014, Shai Erera <se...@gmail.com> wrote:
>
> > Sheng,
> >
> > I assume that you're using the Lucene faceting module, so I answer
> > following that:
> >
> > (1) A document can be associated with many facet labels, e.g. Tags/lucene
> > and Author/Shai. The way to extract all facet labels for a particular
> > document is this:
> >
> >   OrdinalsReader ordinals = new DocValuesOrdinalsReader();
> >   OrdinalsSegmentReader ordsSegment =
> > ordinals.getReader(indexReader.leaves().get(0)); // we have only one
> > segment
> >   IntsRef scratch = new IntsRef();
> >   ordsSegment.get(0, scratch);
> >   for (int i = 0; i < scratch.length; i++) {
> >     System.out.println(taxoReader.getPath(scratch.ints[i]));
> >   }
> >
> > Note that OrdinalsSegmentReader works on an AtomicReader. That means that
> > the doc-id that you pass to it must be relative to the segment. If you
> have
> > a global doc-id, you can wrap the DirectoryReader with a
> > SlowCompositeReaderWrapper, which presents the DirectoryReader as an
> > AtomicReader.
> >
> > (2) I'm not quite sure I understand what you mean by "facet cache". Do
> you
> > mean the taxonomy index? If so the answer is no. Think of the taxonomy
> > index is a large global Map<FacetLabel, Integer>, where each facet label
> is
> > mapped to an integer, irrespective of the segment it is indexed in. That
> > map is used to encode the facet information in the *Search Index* more
> > efficiently.
> >
> > Therefore the taxonomy index itself doesn't hold all the information that
> > is needed for faceted search, and you cannot only rebuild it.
> >
> > Shai
> >
> >
> > On Wed, Aug 13, 2014 at 8:08 AM, Ralf Heyde <ralf.heyde@gmx.de
> > <javascript:;>> wrote:
> >
> > > For 1st: from Solr Level i guess, you could select (only) the document
> by
> > > uniqueid. Then you have the facets for that particular document. But
> this
> > > results in one additional query/doc.
> > >
> > > Gesendet von meinem BlackBerry 10-Smartphone.
> > >   Originalnachricht
> > > Von: Sheng
> > > Gesendet: Dienstag, 12. August 2014 23:35
> > > An: java-user@lucene.apache.org <javascript:;>
> > > Antwort an: java-user@lucene.apache.org <javascript:;>
> > > Betreff: Questions for facets search
> > >
> > > I actually have 2 questions:
> > >
> > > 1. Is it possible to get the facet label for a particular document? The
> > > reason we want this is we'd like to allow users to see tags for each
> hit
> > in
> > > addition to the taxonomy for his/her search.
> > >
> > > 2. Is it possible to re-index the facet cache without reindexing the
> > whole
> > > lucene cache, since they are separated? We have a dynamic list of
> faceted
> > > fields, being able to quickly rebuild the whole facet lucene cache
> would
> > be
> > > quite desirable.
> > >
> > > Again, I am using lucene 4.7, thanks in advance to your answers!
> > >
> > > Sheng
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > >
> > >
> >
>

Re: Questions for facets search

Posted by Sheng <sh...@gmail.com>.
Shai,

Thanks a lot for your answers! Sorry, I was distracted by some other
matters during the day and cannot try your suggestions until now. So what
you suggest on 1 is working like a charm :) for 2, it is a pity but I can
understand. By the way, the way you described that facet index gets stored
like a map is quite similar to how we store the payload :) We use an
integer as payload for each token, and store more complicated information
in another Lucene index with the integer payload as the key for each
document.

Sheng

On Wednesday, August 13, 2014, Shai Erera <se...@gmail.com> wrote:

> Sheng,
>
> I assume that you're using the Lucene faceting module, so I answer
> following that:
>
> (1) A document can be associated with many facet labels, e.g. Tags/lucene
> and Author/Shai. The way to extract all facet labels for a particular
> document is this:
>
>   OrdinalsReader ordinals = new DocValuesOrdinalsReader();
>   OrdinalsSegmentReader ordsSegment =
> ordinals.getReader(indexReader.leaves().get(0)); // we have only one
> segment
>   IntsRef scratch = new IntsRef();
>   ordsSegment.get(0, scratch);
>   for (int i = 0; i < scratch.length; i++) {
>     System.out.println(taxoReader.getPath(scratch.ints[i]));
>   }
>
> Note that OrdinalsSegmentReader works on an AtomicReader. That means that
> the doc-id that you pass to it must be relative to the segment. If you have
> a global doc-id, you can wrap the DirectoryReader with a
> SlowCompositeReaderWrapper, which presents the DirectoryReader as an
> AtomicReader.
>
> (2) I'm not quite sure I understand what you mean by "facet cache". Do you
> mean the taxonomy index? If so the answer is no. Think of the taxonomy
> index is a large global Map<FacetLabel, Integer>, where each facet label is
> mapped to an integer, irrespective of the segment it is indexed in. That
> map is used to encode the facet information in the *Search Index* more
> efficiently.
>
> Therefore the taxonomy index itself doesn't hold all the information that
> is needed for faceted search, and you cannot only rebuild it.
>
> Shai
>
>
> On Wed, Aug 13, 2014 at 8:08 AM, Ralf Heyde <ralf.heyde@gmx.de
> <javascript:;>> wrote:
>
> > For 1st: from Solr Level i guess, you could select (only) the document by
> > uniqueid. Then you have the facets for that particular document. But this
> > results in one additional query/doc.
> >
> > Gesendet von meinem BlackBerry 10-Smartphone.
> >   Originalnachricht
> > Von: Sheng
> > Gesendet: Dienstag, 12. August 2014 23:35
> > An: java-user@lucene.apache.org <javascript:;>
> > Antwort an: java-user@lucene.apache.org <javascript:;>
> > Betreff: Questions for facets search
> >
> > I actually have 2 questions:
> >
> > 1. Is it possible to get the facet label for a particular document? The
> > reason we want this is we'd like to allow users to see tags for each hit
> in
> > addition to the taxonomy for his/her search.
> >
> > 2. Is it possible to re-index the facet cache without reindexing the
> whole
> > lucene cache, since they are separated? We have a dynamic list of faceted
> > fields, being able to quickly rebuild the whole facet lucene cache would
> be
> > quite desirable.
> >
> > Again, I am using lucene 4.7, thanks in advance to your answers!
> >
> > Sheng
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> >
> >
>

Re: Questions for facets search

Posted by Shai Erera <se...@gmail.com>.
Sheng,

I assume that you're using the Lucene faceting module, so I answer
following that:

(1) A document can be associated with many facet labels, e.g. Tags/lucene
and Author/Shai. The way to extract all facet labels for a particular
document is this:

  OrdinalsReader ordinals = new DocValuesOrdinalsReader();
  OrdinalsSegmentReader ordsSegment =
ordinals.getReader(indexReader.leaves().get(0)); // we have only one segment
  IntsRef scratch = new IntsRef();
  ordsSegment.get(0, scratch);
  for (int i = 0; i < scratch.length; i++) {
    System.out.println(taxoReader.getPath(scratch.ints[i]));
  }

Note that OrdinalsSegmentReader works on an AtomicReader. That means that
the doc-id that you pass to it must be relative to the segment. If you have
a global doc-id, you can wrap the DirectoryReader with a
SlowCompositeReaderWrapper, which presents the DirectoryReader as an
AtomicReader.

(2) I'm not quite sure I understand what you mean by "facet cache". Do you
mean the taxonomy index? If so the answer is no. Think of the taxonomy
index is a large global Map<FacetLabel, Integer>, where each facet label is
mapped to an integer, irrespective of the segment it is indexed in. That
map is used to encode the facet information in the *Search Index* more
efficiently.

Therefore the taxonomy index itself doesn't hold all the information that
is needed for faceted search, and you cannot only rebuild it.

Shai


On Wed, Aug 13, 2014 at 8:08 AM, Ralf Heyde <ra...@gmx.de> wrote:

> For 1st: from Solr Level i guess, you could select (only) the document by
> uniqueid. Then you have the facets for that particular document. But this
> results in one additional query/doc.
>
> Gesendet von meinem BlackBerry 10-Smartphone.
>   Originalnachricht
> Von: Sheng
> Gesendet: Dienstag, 12. August 2014 23:35
> An: java-user@lucene.apache.org
> Antwort an: java-user@lucene.apache.org
> Betreff: Questions for facets search
>
> I actually have 2 questions:
>
> 1. Is it possible to get the facet label for a particular document? The
> reason we want this is we'd like to allow users to see tags for each hit in
> addition to the taxonomy for his/her search.
>
> 2. Is it possible to re-index the facet cache without reindexing the whole
> lucene cache, since they are separated? We have a dynamic list of faceted
> fields, being able to quickly rebuild the whole facet lucene cache would be
> quite desirable.
>
> Again, I am using lucene 4.7, thanks in advance to your answers!
>
> Sheng
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

AW: Questions for facets search

Posted by Ralf Heyde <ra...@gmx.de>.
For 1st: from Solr Level i guess, you could select (only) the document by uniqueid. Then you have the facets for that particular document. But this results in one additional query/doc.

Gesendet von meinem BlackBerry 10-Smartphone.
  Originalnachricht  
Von: Sheng
Gesendet: Dienstag, 12. August 2014 23:35
An: java-user@lucene.apache.org
Antwort an: java-user@lucene.apache.org
Betreff: Questions for facets search

I actually have 2 questions:

1. Is it possible to get the facet label for a particular document? The
reason we want this is we'd like to allow users to see tags for each hit in
addition to the taxonomy for his/her search.

2. Is it possible to re-index the facet cache without reindexing the whole
lucene cache, since they are separated? We have a dynamic list of faceted
fields, being able to quickly rebuild the whole facet lucene cache would be
quite desirable.

Again, I am using lucene 4.7, thanks in advance to your answers!

Sheng

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org