You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by wu fox <fo...@gmail.com> on 2006/06/12 14:43:19 UTC

Re: Fwd: How to combine results from several indices

Hi Chuck:
  I am still looking forward to a solution which ensure to to meet the
constraints of
ParallelReader so that I can use it for my seach programm. I have
tried a lot of methods but none of them
is good enough for me because of obvious
bugs. Can you help me? thanks in advance

Re: Fwd: How to combine results from several indices

Posted by Chuck Williams <ch...@manawiz.com>.

Wu,

Glad to hear that!  Congratulations on getting it working.  Looking
forward to your contribution,

Chuck

wu fox wrote on 06/16/2006 03:30 PM:
> Hi ,chuck. I have implment my own parallelReader by override methods like
> Document and ParallelTermDocs ,and it really works.Your idea isnpired
> me and
> I highly appreciate you help.Maybe after some bug fix I can contribute my
> code so that everyone can share the idea and implementation if they
> encounter similar problen
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Fwd: How to combine results from several indices

Posted by wu fox <fo...@gmail.com>.

Hi ,chuck. I have implment my own parallelReader by override methods like
Document and ParallelTermDocs ,and it really works.Your idea isnpired me and
I highly appreciate you help.Maybe after some bug fix I can contribute my
code so that everyone can share the idea and implementation if they
encounter similar problen

Re: Fwd: How to combine results from several indices

Posted by Chuck Williams <ch...@manawiz.com>.

You can try that approach, but I think you will find it more difficult. 
E.g., all of the primitive query classes are written specifically to use
doc-ids.  So, you either need to do you searches separately on each
subindex and then write your own routine to join the results, or you
would need to rewrite all the queries.

I use two different indexing combining techniques:

   1. ParallelReader/ParallelWriter for performance reasons in various
      circumstances; e.g., fast access to frequently used fields (in
      combination with lazy fields -- very useful for fast categorical
      analysis of large samples), fast bulk updates of mutable fields by
      copying a much smaller subindex, etc.
   2. Subindex query rewriting for accessing different types of objects
      in separate indices.  A query on the main index may contain a
      subquery that retrieves objects in a different index and rewrites
      itself into a disjunction of the uid's of those objects.  This
      approach works well assuming you can arrange indexing of fields in
      the main index with subindex uid values, and the disjunction
      expansions are not too large.

Maybe approach 2 is more what you need?  It's pretty simple to do. 
E.g., take a look at MultiTermQuery for a non-primitive query that
rewrites itself dependent on the index.  You need a similar class that
rewrites itself dependent on a different index.

Chuck


wu fox wrote on 06/13/2006 02:18 AM:
> thank you very much Chuck.But I still wondered is there any way that I
> can
> revise ParallelReader so that it do not need the same doc id
> .Can IndexReader comebine different doc according some mapping rules ?for
> example I can override Document method that combine docs from indices
> acoording to same uuid or override some other methods,I think it is much
> easier to do than a writer:) Thank you for your help again.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Fwd: How to combine results from several indices

Posted by wu fox <fo...@gmail.com>.

thank you very much Chuck.But I still wondered is there any way that I can
revise ParallelReader so that it do not need the same doc id
.Can IndexReader comebine different doc according some mapping rules ?for
example I can override Document method that combine docs from indices
acoording to same uuid or override some other methods,I think it is much
easier to do than a writer:) Thank you for your help again.

Re: Fwd: How to combine results from several indices

Posted by Chuck Williams <ch...@manawiz.com>.

Wu,

I've contributed a version of ParallelWriter that takes a middle
ground.  ParallelWriter.addDocument() is synchronized, but the
underlying sub-index writes are done in parallel.  It would be possible
to allow ParallelWriter.addDocument() itself to be multi-threaded, but
the synchronization and recovery get more complex.  The basic idea would
be to have a thread with a work queue for each sub-index.  I may look at
this later, or if you enhance this, please submit your version.

Hope this helps,

Chuck


Chuck Williams wrote on 06/12/2006 09:05 AM:
> Hi Wu,
>
> The simplest solution is to synchronize calls to a
> ParallelWriter.addDocument() method that calls IndexWriter.addDocument()
> for each sub-index.  This will work assuming there are no exceptions and
> assuming you never refresh your IndexReader within
> ParallelWriter.addDocument().  If exceptions occur writing one of the
> sub-indexes, then you need to recover them.  The best approach I've
> found is to delete the unequal final subdocuments and optimize all the
> subindexes to restore equal doc ids.
>
> This approach has the consequence of single-threading all index
> writing.  I'm working on a solution to avoid this, but it may require
> deeper integration into the higher level IndexManager mechanism (which
> does reader reopening, journaling, recovery, and a lot of other things).
>
> If you can get by with single threading, I have a ParallelWriter class
> now that I could contribute.  If not, I'm considering the more general
> solution now, but will only be able to contribute it if it can be kept
> separate from the much larger IndexManager mechanism (which is more
> specific to my app and thus not likely a fit for your app anyway).
>
> Chuck
>
>
> wu fox wrote on 06/12/2006 02:43 AM:
>   
>> Hi Chuck:
>>  I am still looking forward to a solution which ensure to to meet the
>> constraints of
>> ParallelReader so that I can use it for my seach programm. I have
>> tried a lot of methods but none of them
>> is good enough for me because of obvious
>> bugs. Can you help me? thanks in advance
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Fwd: How to combine results from several indices

Posted by Chuck Williams <ch...@manawiz.com>.

Hi Wu,

The simplest solution is to synchronize calls to a
ParallelWriter.addDocument() method that calls IndexWriter.addDocument()
for each sub-index.  This will work assuming there are no exceptions and
assuming you never refresh your IndexReader within
ParallelWriter.addDocument().  If exceptions occur writing one of the
sub-indexes, then you need to recover them.  The best approach I've
found is to delete the unequal final subdocuments and optimize all the
subindexes to restore equal doc ids.

This approach has the consequence of single-threading all index
writing.  I'm working on a solution to avoid this, but it may require
deeper integration into the higher level IndexManager mechanism (which
does reader reopening, journaling, recovery, and a lot of other things).

If you can get by with single threading, I have a ParallelWriter class
now that I could contribute.  If not, I'm considering the more general
solution now, but will only be able to contribute it if it can be kept
separate from the much larger IndexManager mechanism (which is more
specific to my app and thus not likely a fit for your app anyway).

Chuck


wu fox wrote on 06/12/2006 02:43 AM:
> Hi Chuck:
>  I am still looking forward to a solution which ensure to to meet the
> constraints of
> ParallelReader so that I can use it for my seach programm. I have
> tried a lot of methods but none of them
> is good enough for me because of obvious
> bugs. Can you help me? thanks in advance
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org