You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by xu cheng <xc...@gmail.com> on 2010/12/27 04:48:12 UTC

difficulties for me to understand the index chain

Hi all:
I'm new to lucene dev. these days I'm reading the lucene source code. and
now there are some difficulties for me to understand the index chain.
I could not understand the complex relationship between the classes!
for example:
I could not understand the relations between these classes:
 DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread,
DocInverterPerThread..

btw, what 's the advantage of using such a design, the so called index
chain??

Is there any docs about this??

any suggestion or references are appreciated! thanks

regards.

Re: difficulties for me to understand the index chain

Posted by xu cheng <xc...@gmail.com>.

hi Li Li
thanks for your answer very much!!!

     To support multithreads indexing, PerThread class is used.
multithreads to do what? each thread for processing per file, or each thread
for processing per field or something else??

regards


2010/12/27 Li Li <fa...@gmail.com>

> I am also interested in this question.
> And my understanding may be wrong.
>
>
> 2010/12/27 xu cheng <xc...@gmail.com>:
> > Hi all:
> > I'm new to lucene dev. these days I'm reading the lucene source code. and
> > now there are some difficulties for me to understand the index chain.
> > I could not understand the complex relationship between the classes!
> > for example:
> > I could not understand the relations between these classes:
> >  DocFieldConsumerPerThread, DocFieldConsumerPerField,
> DocInvertedPerThread,
> > DocInverterPerThread..
>       because segments often have the same fields, so PerField is used
> to share common things.
>      To support multithreads indexing, PerThread class is used.
>
>   See codes in DocumentsWriter
>
>  static final IndexingChain DefaultIndexingChain = new IndexingChain() {
>
>    DocConsumer getChain(DocumentsWriter documentsWriter) {
>      /*
>      This is the current indexing chain:
>
>      DocConsumer / DocConsumerPerThread
>        --> code: DocFieldProcessor / DocFieldProcessorPerThread
>          --> DocFieldConsumer / DocFieldConsumerPerThread /
> DocFieldConsumerPerField
>            --> code: DocFieldConsumers / DocFieldConsumersPerThread /
> DocFieldConsumersPerField
>              --> code: DocInverter / DocInverterPerThread /
> DocInverterPerField
>                --> InvertedDocConsumer / InvertedDocConsumerPerThread
> / InvertedDocConsumerPerField
>                  --> code: TermsHash / TermsHashPerThread /
> TermsHashPerField
>                    --> TermsHashConsumer / TermsHashConsumerPerThread
> / TermsHashConsumerPerField
>                      --> code: FreqProxTermsWriter /
> FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField
>                      --> code: TermVectorsTermsWriter /
> TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField
>                --> InvertedDocEndConsumer /
> InvertedDocConsumerPerThread / InvertedDocConsumerPerField
>                  --> code: NormsWriter / NormsWriterPerThread /
> NormsWriterPerField
>              --> code: StoredFieldsWriter /
> StoredFieldsWriterPerThread / StoredFieldsWriterPerField
>    */
>
>    // Build up indexing chain:
>
>      final TermsHashConsumer termVectorsWriter = new
> TermVectorsTermsWriter(documentsWriter);
>      final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();
>
>      final InvertedDocConsumer  termsHash = new
> TermsHash(documentsWriter, true, freqProxWriter,
>                                                           new
> TermsHash(documentsWriter, false, termVectorsWriter, null));
>      final NormsWriter normsWriter = new NormsWriter();
>      final DocInverter docInverter = new DocInverter(termsHash,
> normsWriter);
>      return new DocFieldProcessor(documentsWriter, docInverter);
>     }
>  };
> > btw, what 's the advantage of using such a design, the so called index
> > chain??
>       I think because older version of lucene only support single
> thread indexing and to reuse existed codes, they designed such a
> architecture.
> > Is there any docs about this??
>       If you can read Chinese, you may find some useful articles here:
> http://forfuture1978.javaeye.com/
>      But I think read codes are very helpful.
> > any suggestion or references are appreciated! thanks
> > regards.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: difficulties for me to understand the index chain

Posted by Li Li <fa...@gmail.com>.

I am also interested in this question.
And my understanding may be wrong.


2010/12/27 xu cheng <xc...@gmail.com>:
> Hi all:
> I'm new to lucene dev. these days I'm reading the lucene source code. and
> now there are some difficulties for me to understand the index chain.
> I could not understand the complex relationship between the classes!
> for example:
> I could not understand the relations between these classes:
>  DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread,
> DocInverterPerThread..
      because segments often have the same fields, so PerField is used
to share common things.
      To support multithreads indexing, PerThread class is used.

   See codes in DocumentsWriter

  static final IndexingChain DefaultIndexingChain = new IndexingChain() {

    DocConsumer getChain(DocumentsWriter documentsWriter) {
      /*
      This is the current indexing chain:

      DocConsumer / DocConsumerPerThread
        --> code: DocFieldProcessor / DocFieldProcessorPerThread
          --> DocFieldConsumer / DocFieldConsumerPerThread /
DocFieldConsumerPerField
            --> code: DocFieldConsumers / DocFieldConsumersPerThread /
DocFieldConsumersPerField
              --> code: DocInverter / DocInverterPerThread / DocInverterPerField
                --> InvertedDocConsumer / InvertedDocConsumerPerThread
/ InvertedDocConsumerPerField
                  --> code: TermsHash / TermsHashPerThread / TermsHashPerField
                    --> TermsHashConsumer / TermsHashConsumerPerThread
/ TermsHashConsumerPerField
                      --> code: FreqProxTermsWriter /
FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField
                      --> code: TermVectorsTermsWriter /
TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField
                --> InvertedDocEndConsumer /
InvertedDocConsumerPerThread / InvertedDocConsumerPerField
                  --> code: NormsWriter / NormsWriterPerThread /
NormsWriterPerField
              --> code: StoredFieldsWriter /
StoredFieldsWriterPerThread / StoredFieldsWriterPerField
    */

    // Build up indexing chain:

      final TermsHashConsumer termVectorsWriter = new
TermVectorsTermsWriter(documentsWriter);
      final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();

      final InvertedDocConsumer  termsHash = new
TermsHash(documentsWriter, true, freqProxWriter,
                                                           new
TermsHash(documentsWriter, false, termVectorsWriter, null));
      final NormsWriter normsWriter = new NormsWriter();
      final DocInverter docInverter = new DocInverter(termsHash, normsWriter);
      return new DocFieldProcessor(documentsWriter, docInverter);
    }
  };
> btw, what 's the advantage of using such a design, the so called index
> chain??
      I think because older version of lucene only support single
thread indexing and to reuse existed codes, they designed such a
architecture.
> Is there any docs about this??
      If you can read Chinese, you may find some useful articles here:
http://forfuture1978.javaeye.com/
      But I think read codes are very helpful.
> any suggestion or references are appreciated! thanks
> regards.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org