You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by oramas martín <jl...@gmail.com> on 2007/03/08 20:23:45 UTC

Re: A solution to HitCollector-based searches problems

Hello,

I have just added some search implementation samples based on this collector
solution, to easy the use and understanding or it:

    - KeywordSearch: Extract the terms (and frequency) found in a list of
fields
                     from the results of a query/filter search

    - GoogleSearch: Return an ordered search result grouped a la Google,
based
                    on the terms found in a list of fields

    - GetFieldNamesOp: Operation to mimic the getFieldNames method of
                       IndexReader but using a searcher. With it, it is
possible
                       to explore the fields of remote indexes.

See http://sourceforge.net/projects/lucollector/ for the source code (
lu-collector-src-sampleop-0.8.zip).

Regards,
José L. Oramas

On 2/26/07, oramas martín <jl...@gmail.com> wrote:
>
>
> Hello,
>
> As you probably know, the HitCollector-based search API is not meant to
> work remotely, because it will generate a RPC-callback for every non-zero
> score.
>
> There is another problem with MultiSearcher-HitCollector-based search
> which knows nothing about mix HitCollector based searches (not to say it has
> hardcode the way to mix TopDocs for the score and for the Sort searches).
> Also the ParallelMultiSearcher inherits this problems and is unable to
> parallelize the HitCollector-based searcher.
>
> A final problem with the HitCollector-based search is related to the lost
> of a limit in the results, as the Hits class implements thought the
> getMoreDocs() function, and lazy loading and caching of documents it does.
>
>
> To solve those problems it is necessary a factory (HitCollectorSource)
> able to generate collectors for single (SingleHitCollector) an multi
> (MultiHitCollector) searches, and a new search method in the
> Searchable interface for it. To avoid modifications to the lucene core, the
> later requirement is NOT IMPLEMENTED in the library I have just created.
> Instead, an ugly solution, a wrapper for those searchers
> (SearcherHCSourceWrapper) and a Filter wrapper (FilterHitCollectorSource) to
> carry the factory-based searches, is provided.
>
> Each collector is based in a two steps sequence, one for collecting hits
> or subsearcher results, and another for generating the final result.
>
> Also, just in case you don't want to add a wrapper to each searcher of
> your project, there is an adapted version of IndexSearcher, MultiSearcher
> and ParallelMultiSearcher (only for version 2.1) modified exactly the same
> way the wrapper class SearcherHCSourceWrapper does. Just put them in your
> class-path (before the Lucene core jar) and you will be using the new
> collector interfaces without modifying your code.
>
> There are some unit testing (copied and adapted from the Lucene 2.1distribution).
>
> See http://sourceforge.net/projects/lucollector/ for the jar files and the
> code.
>
> If you find it interesting to complement the Lucene project, tell me how
> to put it in the contribution area.
>
> Regards,
> José L. Oramas
>

Re: A solution to HitCollector-based searches problems

Posted by oramas martín <jl...@gmail.com>.
Hello Mohammad,

You don't have to remove the lucene-core jar file, just play with the
classpath order to load the jar files.

You have two possibilities:

    1.- Use the searcher wrapper (valid for Lucene 2.0 or 2.1 version): Put
"lu-collector-0.8.jar" AFTER "lucene-core-2.1.0.jar" and wrap any
IndexSearcher, MultiSearcher and ParallelSearcher with
SearcherHCSourceWrapper

    2.- Use the modified lucene core searcher classes (valid for
Lucene 2.1version): Put "
lu-collector-0.8.jar" BEFORE "lucene-core-2.1.0.jar" in the classpath, and
call "IndexSearcher.USE_NEW_COLLECTOR = true;" at the beginning of your
program.

If you take a look at the test files, you can see the usage of both method.

Hope this help,
José L. Oramas


On 3/11/07, Mohammad Norouzi <mn...@gmail.com> wrote:
>
> Hi Oramas
> if I use that jar file, it conflicts with lucene-core.jar file. for
> exampl,
> IndexSearcher class that you defined is different from the original one.
> Do
> I have to remove the lucene-core jar file?
> if yes, how about the other original classes
>
> On 3/8/07, oramas martín <jl...@gmail.com> wrote:
> >
> > Hello,
> >
> > I have just added some search implementation samples based on this
> > collector
> > solution, to easy the use and understanding or it:
> >
> >     - KeywordSearch: Extract the terms (and frequency) found in a list
> of
> > fields
> >                      from the results of a query/filter search
> >
> >     - GoogleSearch: Return an ordered search result grouped a la Google,
> > based
> >                     on the terms found in a list of fields
> >
> >     - GetFieldNamesOp: Operation to mimic the getFieldNames method of
> >                        IndexReader but using a searcher. With it, it is
> > possible
> >                        to explore the fields of remote indexes.
> >
> > See http://sourceforge.net/projects/lucollector/ for the source code (
> > lu-collector-src-sampleop-0.8.zip).
> >
> > Regards,
> > José L. Oramas
> >
> > On 2/26/07, oramas martín <jl...@gmail.com> wrote:
> > >
> > >
> > > Hello,
> > >
> > > As you probably know, the HitCollector-based search API is not meant
> to
> > > work remotely, because it will generate a RPC-callback for every
> > non-zero
> > > score.
> > >
> > > There is another problem with MultiSearcher-HitCollector-based search
> > > which knows nothing about mix HitCollector based searches (not to say
> it
> > has
> > > hardcode the way to mix TopDocs for the score and for the Sort
> > searches).
> > > Also the ParallelMultiSearcher inherits this problems and is unable to
> > > parallelize the HitCollector-based searcher.
> > >
> > > A final problem with the HitCollector-based search is related to the
> > lost
> > > of a limit in the results, as the Hits class implements thought the
> > > getMoreDocs() function, and lazy loading and caching of documents it
> > does.
> > >
> > >
> > > To solve those problems it is necessary a factory (HitCollectorSource)
> > > able to generate collectors for single (SingleHitCollector) an multi
> > > (MultiHitCollector) searches, and a new search method in the
> > > Searchable interface for it. To avoid modifications to the lucene
> core,
> > the
> > > later requirement is NOT IMPLEMENTED in the library I have just
> created.
> > > Instead, an ugly solution, a wrapper for those searchers
> > > (SearcherHCSourceWrapper) and a Filter wrapper
> > (FilterHitCollectorSource) to
> > > carry the factory-based searches, is provided.
> > >
> > > Each collector is based in a two steps sequence, one for collecting
> hits
> > > or subsearcher results, and another for generating the final result.
> > >
> > > Also, just in case you don't want to add a wrapper to each searcher of
> > > your project, there is an adapted version of IndexSearcher,
> > MultiSearcher
> > > and ParallelMultiSearcher (only for version 2.1) modified exactly the
> > same
> > > way the wrapper class SearcherHCSourceWrapper does. Just put them in
> > your
> > > class-path (before the Lucene core jar) and you will be using the new
> > > collector interfaces without modifying your code.
> > >
> > > There are some unit testing (copied and adapted from the Lucene
> > 2.1distribution).
> > >
> > > See http://sourceforge.net/projects/lucollector/ for the jar files and
> > the
> > > code.
> > >
> > > If you find it interesting to complement the Lucene project, tell me
> how
> > > to put it in the contribution area.
> > >
> > > Regards,
> > > José L. Oramas
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
>

Re: A solution to HitCollector-based searches problems

Posted by Mohammad Norouzi <mn...@gmail.com>.
Hi Oramas
if I use that jar file, it conflicts with lucene-core.jar file. for exampl,
IndexSearcher class that you defined is different from the original one. Do
I have to remove the lucene-core jar file?
if yes, how about the other original classes

On 3/8/07, oramas martín <jl...@gmail.com> wrote:
>
> Hello,
>
> I have just added some search implementation samples based on this
> collector
> solution, to easy the use and understanding or it:
>
>     - KeywordSearch: Extract the terms (and frequency) found in a list of
> fields
>                      from the results of a query/filter search
>
>     - GoogleSearch: Return an ordered search result grouped a la Google,
> based
>                     on the terms found in a list of fields
>
>     - GetFieldNamesOp: Operation to mimic the getFieldNames method of
>                        IndexReader but using a searcher. With it, it is
> possible
>                        to explore the fields of remote indexes.
>
> See http://sourceforge.net/projects/lucollector/ for the source code (
> lu-collector-src-sampleop-0.8.zip).
>
> Regards,
> José L. Oramas
>
> On 2/26/07, oramas martín <jl...@gmail.com> wrote:
> >
> >
> > Hello,
> >
> > As you probably know, the HitCollector-based search API is not meant to
> > work remotely, because it will generate a RPC-callback for every
> non-zero
> > score.
> >
> > There is another problem with MultiSearcher-HitCollector-based search
> > which knows nothing about mix HitCollector based searches (not to say it
> has
> > hardcode the way to mix TopDocs for the score and for the Sort
> searches).
> > Also the ParallelMultiSearcher inherits this problems and is unable to
> > parallelize the HitCollector-based searcher.
> >
> > A final problem with the HitCollector-based search is related to the
> lost
> > of a limit in the results, as the Hits class implements thought the
> > getMoreDocs() function, and lazy loading and caching of documents it
> does.
> >
> >
> > To solve those problems it is necessary a factory (HitCollectorSource)
> > able to generate collectors for single (SingleHitCollector) an multi
> > (MultiHitCollector) searches, and a new search method in the
> > Searchable interface for it. To avoid modifications to the lucene core,
> the
> > later requirement is NOT IMPLEMENTED in the library I have just created.
> > Instead, an ugly solution, a wrapper for those searchers
> > (SearcherHCSourceWrapper) and a Filter wrapper
> (FilterHitCollectorSource) to
> > carry the factory-based searches, is provided.
> >
> > Each collector is based in a two steps sequence, one for collecting hits
> > or subsearcher results, and another for generating the final result.
> >
> > Also, just in case you don't want to add a wrapper to each searcher of
> > your project, there is an adapted version of IndexSearcher,
> MultiSearcher
> > and ParallelMultiSearcher (only for version 2.1) modified exactly the
> same
> > way the wrapper class SearcherHCSourceWrapper does. Just put them in
> your
> > class-path (before the Lucene core jar) and you will be using the new
> > collector interfaces without modifying your code.
> >
> > There are some unit testing (copied and adapted from the Lucene
> 2.1distribution).
> >
> > See http://sourceforge.net/projects/lucollector/ for the jar files and
> the
> > code.
> >
> > If you find it interesting to complement the Lucene project, tell me how
> > to put it in the contribution area.
> >
> > Regards,
> > José L. Oramas
> >
>



-- 
Regards,
Mohammad