You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by abhay kumar <ab...@gmail.com> on 2009/09/16 08:25:56 UTC

Re: Retrieving a field from all result docuemnts & couple of more queries

Hi,

1)Solr has various type of caches . We can specify how many documents cache
can have at a time.
       e.g. if windowsize=50
           50 results will be cached in queryResult Cache.
            if user makes a new request to server for results after 50
documents a new request will be sent to the server & server will retrieve
next             50 results in the cache.
       http://wiki.apache.org/solr/SolrCaching
       Yes, solr looks into the cache to retrieve the fields to be returned.

2) Yes, we can have different tokenizers or filters for index & search. We
need not create a different fieldtype. We need to configure the same
fieldtype (datatype) for index & search analyzers sections differently.

   e.g.

        <fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
          *<analyzer type="index">*
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>

         <!--<filter class="solr.SynonymFilterFactory"
synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
         <filter class="solr.StandardFilterFactory"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
      * <analyzer type="query">*
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>

         <filter class="solr.StandardFilterFactory"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>



Regards,
Abhay

On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <sh...@gmail.com>wrote:

> Hi,
>
> I am familiar with Lucene and trying out Solr.
>
> I have index which was created outside solr. The index is fairly
> simple with two field - document_id  & content. The query result needs
> to return all the document IDs. The result need not be ordered by the
> score. For this, in Lucene, I use custom hit collector with search to
> get results quickly. The index has a few million documents and queries
> returning hundreds of thousands of documents are not uncommon. So, the
> speed is crucial here.
>
> Since retrieving the document_id for each document is slow, I am using
> FileldCache to store the values of document_id. For all the results
> collected (in a bitset) with hit collector, document_id field is
> retrieved from the fieldcache.
>
> 1. How can I effectively disable scoring? I have read that
> ConstantScoreQuery is quite fast, but from the code, I see that it is
> used only for wildcard queries. How can I use ConstantScoreQuery for
> all the queries (boolean, term, phrase, ..)?  Also, is
> ConstantScoreQuery as fast as a custom hit collector?
>
> 2. How can Solr take advantage of the fieldcache while returning the
> field document_id? The documentation says, fieldcache can be
> explicitly auto warmed with Solr.  If fieldcache is available and
> initialized at the beginning, will solr look into the cache to
> retrieve the fields to be returned?
>
> 3. If there is an additional field for stemmed_content on which search
> needs to use different analyzer, I suppose, that could be specified by
> fieldType attribute in the schema.
>
> Thank you,
>
> --shashi
>

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by Chris Hostetter <ho...@fucit.org>.

: As I mentioned previously, I prefer to do this with as little java
: code as possible. That's the motivation for me to take a look at solr.

I understand, but as i already said "there is no pure configuration way to 
obtain the same logic you could get from a custom HitCollector"

you can get the same behavior you currently have, with the same 
existing efficiencies plus take advantage of the solr filter cache by 
writting writing a custom RequestHandler (or SearchCOmponent) that would 
be about 5 lines long... get the DocSet from the searcher for your parsed 
query (you can reuse the existing Solr QueryParser framework and 
utilities) then iterate over the DocSet and add each FieldCache value to 
the response.

FWIW: I would encourage you to try using Solr as is, w/o any custom code 
or messing with the field cache and just set "fl=yourField" and see if the 
performance is satisfactory to you.  it will still do scoring but you 
might be suprised how fast stored fields can be returned (under the covers 
solr uses a FieldSelector contain just the "fl" fields)



-Hoss

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by Shashikant Kore <sh...@gmail.com>.

Hoss,

As I mentioned previously, I prefer to do this with as little java
code as possible. That's the motivation for me to take a look at solr.

Here is the code snippet.

OpenBitSet resultBitset = new OpenBitSet(this.searcher.maxDoc());

this.searcher.search(query, new HitCollector() {
				@Override
				public void collect(int docID, float arg1) {
					resultBitset.set(docID);
				}
});

Then I retrieve the stored field and look up the results present in
the resultBitset.

int[] docIDs = FieldCache.DEFAULT.getInts(this.luceneIndex.reader,
FIELD_DOCUMENT_ID);

I need to do this as I need all the matching results, but order is not
important (for this search.) In the index, the content field has term
vector with it, which I can't drop. There are other types of searches
where relevance ranking is required.

Can I achieve the same with Solr?

Thanks,

--shashi

On Fri, Sep 18, 2009 at 3:21 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : You will need to get SolrIndexSearcher.java and modify following:-
> :
> : public static final int GET_SCORES             =       0x01;
>
> No.  Do not do that.  There is no reason for anyone, to EVER modify that
> line of code. Absolutely NONE!!!!
>
> If you've made that change to your version of Solr, pelase start a new
> thread on solr-user explaining your goal, and what things you tried before
> ultimately amking that change, because i garuntee you that if you are
> willing to modify java files to change that line, there will be a more
> general purpose reusable way to solve your goal besides that (which won't
> silently break alot of other functionality)
>
> : > No, I don't wish to put a custom Similarity.  Rather, I want an
> : > equivalent of HitCollector where I can bypass the scoring altogether.
> : > And I prefer to do it by changing the configuration.
>
> ...there is no pure configuration way to obtain the same logic you could
> get from a custom HitCollector.  You haven't elaborated on what exactly
> your HitCollector looked like, but so far you've mentioned that it
> ignored the scores, and used the FieldCache to get a field value w/o
> dealing with stored fields -- you can achieve something roughly
> functionally similar by writing a custom RequestHandler that uses
> SolrIndexSearcher.getDocSet (which skips scoring and sorting) and then
> iterate over that DocSet and fetch the values you want from the
> FieldCache.
>
> or you could write a RequestHandler that uses your HitCollector as is --
> but then you aren't really leveraging any value from Solr at all, the
> previous suggestion has the value add of utilizing Solr's filterCache for
> frequent queries (which can be really handy if your queries can be
> easily broken apart into pieces and dealt with using DocSet
> union/intersection operations -- like q/fq are dealt with in
> SearchHandler)
>
>
> -Hoss
>
>

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by Chris Hostetter <ho...@fucit.org>.

: You will need to get SolrIndexSearcher.java and modify following:-
: 
: public static final int GET_SCORES             =       0x01;

No.  Do not do that.  There is no reason for anyone, to EVER modify that 
line of code. Absolutely NONE!!!!  

If you've made that change to your version of Solr, pelase start a new 
thread on solr-user explaining your goal, and what things you tried before 
ultimately amking that change, because i garuntee you that if you are 
willing to modify java files to change that line, there will be a more 
general purpose reusable way to solve your goal besides that (which won't 
silently break alot of other functionality)

: > No, I don't wish to put a custom Similarity.  Rather, I want an
: > equivalent of HitCollector where I can bypass the scoring altogether.
: > And I prefer to do it by changing the configuration.

...there is no pure configuration way to obtain the same logic you could 
get from a custom HitCollector.  You haven't elaborated on what exactly 
your HitCollector looked like, but so far you've mentioned that it 
ignored the scores, and used the FieldCache to get a field value w/o 
dealing with stored fields -- you can achieve something roughly 
functionally similar by writing a custom RequestHandler that uses 
SolrIndexSearcher.getDocSet (which skips scoring and sorting) and then 
iterate over that DocSet and fetch the values you want from the 
FieldCache.

or you could write a RequestHandler that uses your HitCollector as is -- 
but then you aren't really leveraging any value from Solr at all, the 
previous suggestion has the value add of utilizing Solr's filterCache for 
frequent queries (which can be really handy if your queries can be 
easily broken apart into pieces and dealt with using DocSet 
union/intersection operations -- like q/fq are dealt with in 
SearchHandler) 


-Hoss

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by rajan chandi <ch...@gmail.com>.

You will need to get SolrIndexSearcher.java and modify following:-

public static final int GET_SCORES             =       0x01;


--Rajan

On Wed, Sep 16, 2009 at 6:58 PM, Shashikant Kore <sh...@gmail.com>wrote:

> No, I don't wish to put a custom Similarity.  Rather, I want an
> equivalent of HitCollector where I can bypass the scoring altogether.
> And I prefer to do it by changing the configuration.
>
> --shashi
>
> On Wed, Sep 16, 2009 at 6:36 PM, rajan chandi <ch...@gmail.com>
> wrote:
> > You might be talking about modifying the similarity object to modify
> scoring
> > formula in Lucene!
> >
> >  $searcher->setSimilarity($similarity);
> >  $writer->setSimilarity($similarity);
> >
> >
> > This can very well be done in Solr as SolrIndexWriter inherits from
> Lucene
> > IndexWriter class.
> > You might want to download the Solr Source code and take a look at the
> > SolrIndexWriter to begin with!
> >
> > It's in the package - org.apache.solr.update
> >
> > Thanks
> > Rajan
> >
> > On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore <shashikant@gmail.com
> >wrote:
> >
> >> Thanks, Abhay.
> >>
> >> Can someone please throw light on how to disable scoring?
> >>
> >> --shashi
> >>
> >> On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar <ab...@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > 1)Solr has various type of caches . We can specify how many documents
> >> cache
> >> > can have at a time.
> >> >       e.g. if windowsize=50
> >> >           50 results will be cached in queryResult Cache.
> >> >            if user makes a new request to server for results after 50
> >> > documents a new request will be sent to the server & server will
> retrieve
> >> > next             50 results in the cache.
> >> >       http://wiki.apache.org/solr/SolrCaching
> >> >       Yes, solr looks into the cache to retrieve the fields to be
> >> returned.
> >> >
> >> > 2) Yes, we can have different tokenizers or filters for index &
> search.
> >> We
> >> > need not create a different fieldtype. We need to configure the same
> >> > fieldtype (datatype) for index & search analyzers sections
> differently.
> >> >
> >> >   e.g.
> >> >
> >> >        <fieldType name="textSpell" class="solr.TextField"
> >> > positionIncrementGap="100" stored="false" multiValued="true">
> >> >          *<analyzer type="index">*
> >> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >
> >> >         <!--<filter class="solr.SynonymFilterFactory"
> >> > synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
> >> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt"/>
> >> >         <filter class="solr.StandardFilterFactory"/>
> >> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >> >       </analyzer>
> >> >      * <analyzer type="query">*
> >> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >
> >> >         <filter class="solr.StandardFilterFactory"/>
> >> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >> >      </analyzer>
> >> >    </fieldType>
> >> >
> >> >
> >> >
> >> > Regards,
> >> > Abhay
> >> >
> >> > On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <
> shashikant@gmail.com
> >> >wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I am familiar with Lucene and trying out Solr.
> >> >>
> >> >> I have index which was created outside solr. The index is fairly
> >> >> simple with two field - document_id  & content. The query result
> needs
> >> >> to return all the document IDs. The result need not be ordered by the
> >> >> score. For this, in Lucene, I use custom hit collector with search to
> >> >> get results quickly. The index has a few million documents and
> queries
> >> >> returning hundreds of thousands of documents are not uncommon. So,
> the
> >> >> speed is crucial here.
> >> >>
> >> >> Since retrieving the document_id for each document is slow, I am
> using
> >> >> FileldCache to store the values of document_id. For all the results
> >> >> collected (in a bitset) with hit collector, document_id field is
> >> >> retrieved from the fieldcache.
> >> >>
> >> >> 1. How can I effectively disable scoring? I have read that
> >> >> ConstantScoreQuery is quite fast, but from the code, I see that it is
> >> >> used only for wildcard queries. How can I use ConstantScoreQuery for
> >> >> all the queries (boolean, term, phrase, ..)?  Also, is
> >> >> ConstantScoreQuery as fast as a custom hit collector?
> >> >>
> >> >> 2. How can Solr take advantage of the fieldcache while returning the
> >> >> field document_id? The documentation says, fieldcache can be
> >> >> explicitly auto warmed with Solr.  If fieldcache is available and
> >> >> initialized at the beginning, will solr look into the cache to
> >> >> retrieve the fields to be returned?
> >> >>
> >> >> 3. If there is an additional field for stemmed_content on which
> search
> >> >> needs to use different analyzer, I suppose, that could be specified
> by
> >> >> fieldType attribute in the schema.
> >> >>
> >> >> Thank you,
> >> >>
> >> >> --shashi
> >> >>
> >> >
> >>
> >
>

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by Shashikant Kore <sh...@gmail.com>.

No, I don't wish to put a custom Similarity.  Rather, I want an
equivalent of HitCollector where I can bypass the scoring altogether.
And I prefer to do it by changing the configuration.

--shashi

On Wed, Sep 16, 2009 at 6:36 PM, rajan chandi <ch...@gmail.com> wrote:
> You might be talking about modifying the similarity object to modify scoring
> formula in Lucene!
>
>  $searcher->setSimilarity($similarity);
>  $writer->setSimilarity($similarity);
>
>
> This can very well be done in Solr as SolrIndexWriter inherits from Lucene
> IndexWriter class.
> You might want to download the Solr Source code and take a look at the
> SolrIndexWriter to begin with!
>
> It's in the package - org.apache.solr.update
>
> Thanks
> Rajan
>
> On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore <sh...@gmail.com>wrote:
>
>> Thanks, Abhay.
>>
>> Can someone please throw light on how to disable scoring?
>>
>> --shashi
>>
>> On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar <ab...@gmail.com> wrote:
>> > Hi,
>> >
>> > 1)Solr has various type of caches . We can specify how many documents
>> cache
>> > can have at a time.
>> >       e.g. if windowsize=50
>> >           50 results will be cached in queryResult Cache.
>> >            if user makes a new request to server for results after 50
>> > documents a new request will be sent to the server & server will retrieve
>> > next             50 results in the cache.
>> >       http://wiki.apache.org/solr/SolrCaching
>> >       Yes, solr looks into the cache to retrieve the fields to be
>> returned.
>> >
>> > 2) Yes, we can have different tokenizers or filters for index & search.
>> We
>> > need not create a different fieldtype. We need to configure the same
>> > fieldtype (datatype) for index & search analyzers sections differently.
>> >
>> >   e.g.
>> >
>> >        <fieldType name="textSpell" class="solr.TextField"
>> > positionIncrementGap="100" stored="false" multiValued="true">
>> >          *<analyzer type="index">*
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >
>> >         <!--<filter class="solr.SynonymFilterFactory"
>> > synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
>> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt"/>
>> >         <filter class="solr.StandardFilterFactory"/>
>> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >       </analyzer>
>> >      * <analyzer type="query">*
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >
>> >         <filter class="solr.StandardFilterFactory"/>
>> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >      </analyzer>
>> >    </fieldType>
>> >
>> >
>> >
>> > Regards,
>> > Abhay
>> >
>> > On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <shashikant@gmail.com
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> I am familiar with Lucene and trying out Solr.
>> >>
>> >> I have index which was created outside solr. The index is fairly
>> >> simple with two field - document_id  & content. The query result needs
>> >> to return all the document IDs. The result need not be ordered by the
>> >> score. For this, in Lucene, I use custom hit collector with search to
>> >> get results quickly. The index has a few million documents and queries
>> >> returning hundreds of thousands of documents are not uncommon. So, the
>> >> speed is crucial here.
>> >>
>> >> Since retrieving the document_id for each document is slow, I am using
>> >> FileldCache to store the values of document_id. For all the results
>> >> collected (in a bitset) with hit collector, document_id field is
>> >> retrieved from the fieldcache.
>> >>
>> >> 1. How can I effectively disable scoring? I have read that
>> >> ConstantScoreQuery is quite fast, but from the code, I see that it is
>> >> used only for wildcard queries. How can I use ConstantScoreQuery for
>> >> all the queries (boolean, term, phrase, ..)?  Also, is
>> >> ConstantScoreQuery as fast as a custom hit collector?
>> >>
>> >> 2. How can Solr take advantage of the fieldcache while returning the
>> >> field document_id? The documentation says, fieldcache can be
>> >> explicitly auto warmed with Solr.  If fieldcache is available and
>> >> initialized at the beginning, will solr look into the cache to
>> >> retrieve the fields to be returned?
>> >>
>> >> 3. If there is an additional field for stemmed_content on which search
>> >> needs to use different analyzer, I suppose, that could be specified by
>> >> fieldType attribute in the schema.
>> >>
>> >> Thank you,
>> >>
>> >> --shashi
>> >>
>> >
>>
>

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by rajan chandi <ch...@gmail.com>.

You might be talking about modifying the similarity object to modify scoring
formula in Lucene!

  $searcher->setSimilarity($similarity);
  $writer->setSimilarity($similarity);


This can very well be done in Solr as SolrIndexWriter inherits from Lucene
IndexWriter class.
You might want to download the Solr Source code and take a look at the
SolrIndexWriter to begin with!

It's in the package - org.apache.solr.update

Thanks
Rajan

On Wed, Sep 16, 2009 at 5:42 PM, Shashikant Kore <sh...@gmail.com>wrote:

> Thanks, Abhay.
>
> Can someone please throw light on how to disable scoring?
>
> --shashi
>
> On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar <ab...@gmail.com> wrote:
> > Hi,
> >
> > 1)Solr has various type of caches . We can specify how many documents
> cache
> > can have at a time.
> >       e.g. if windowsize=50
> >           50 results will be cached in queryResult Cache.
> >            if user makes a new request to server for results after 50
> > documents a new request will be sent to the server & server will retrieve
> > next             50 results in the cache.
> >       http://wiki.apache.org/solr/SolrCaching
> >       Yes, solr looks into the cache to retrieve the fields to be
> returned.
> >
> > 2) Yes, we can have different tokenizers or filters for index & search.
> We
> > need not create a different fieldtype. We need to configure the same
> > fieldtype (datatype) for index & search analyzers sections differently.
> >
> >   e.g.
> >
> >        <fieldType name="textSpell" class="solr.TextField"
> > positionIncrementGap="100" stored="false" multiValued="true">
> >          *<analyzer type="index">*
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >
> >         <!--<filter class="solr.SynonymFilterFactory"
> > synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >         <filter class="solr.StandardFilterFactory"/>
> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >       </analyzer>
> >      * <analyzer type="query">*
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >
> >         <filter class="solr.StandardFilterFactory"/>
> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >      </analyzer>
> >    </fieldType>
> >
> >
> >
> > Regards,
> > Abhay
> >
> > On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <shashikant@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> I am familiar with Lucene and trying out Solr.
> >>
> >> I have index which was created outside solr. The index is fairly
> >> simple with two field - document_id  & content. The query result needs
> >> to return all the document IDs. The result need not be ordered by the
> >> score. For this, in Lucene, I use custom hit collector with search to
> >> get results quickly. The index has a few million documents and queries
> >> returning hundreds of thousands of documents are not uncommon. So, the
> >> speed is crucial here.
> >>
> >> Since retrieving the document_id for each document is slow, I am using
> >> FileldCache to store the values of document_id. For all the results
> >> collected (in a bitset) with hit collector, document_id field is
> >> retrieved from the fieldcache.
> >>
> >> 1. How can I effectively disable scoring? I have read that
> >> ConstantScoreQuery is quite fast, but from the code, I see that it is
> >> used only for wildcard queries. How can I use ConstantScoreQuery for
> >> all the queries (boolean, term, phrase, ..)?  Also, is
> >> ConstantScoreQuery as fast as a custom hit collector?
> >>
> >> 2. How can Solr take advantage of the fieldcache while returning the
> >> field document_id? The documentation says, fieldcache can be
> >> explicitly auto warmed with Solr.  If fieldcache is available and
> >> initialized at the beginning, will solr look into the cache to
> >> retrieve the fields to be returned?
> >>
> >> 3. If there is an additional field for stemmed_content on which search
> >> needs to use different analyzer, I suppose, that could be specified by
> >> fieldType attribute in the schema.
> >>
> >> Thank you,
> >>
> >> --shashi
> >>
> >
>

Re: Retrieving a field from all result docuemnts & couple of more queries

Posted by Shashikant Kore <sh...@gmail.com>.

Thanks, Abhay.

Can someone please throw light on how to disable scoring?

--shashi

On Wed, Sep 16, 2009 at 11:55 AM, abhay kumar <ab...@gmail.com> wrote:
> Hi,
>
> 1)Solr has various type of caches . We can specify how many documents cache
> can have at a time.
>       e.g. if windowsize=50
>           50 results will be cached in queryResult Cache.
>            if user makes a new request to server for results after 50
> documents a new request will be sent to the server & server will retrieve
> next             50 results in the cache.
>       http://wiki.apache.org/solr/SolrCaching
>       Yes, solr looks into the cache to retrieve the fields to be returned.
>
> 2) Yes, we can have different tokenizers or filters for index & search. We
> need not create a different fieldtype. We need to configure the same
> fieldtype (datatype) for index & search analyzers sections differently.
>
>   e.g.
>
>        <fieldType name="textSpell" class="solr.TextField"
> positionIncrementGap="100" stored="false" multiValued="true">
>          *<analyzer type="index">*
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>         <!--<filter class="solr.SynonymFilterFactory"
> synonyms="Synonyms.txt" ignoreCase="true" expand="false"/>-->
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>      * <analyzer type="query">*
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>
>
> Regards,
> Abhay
>
> On Tue, Sep 15, 2009 at 6:41 PM, Shashikant Kore <sh...@gmail.com>wrote:
>
>> Hi,
>>
>> I am familiar with Lucene and trying out Solr.
>>
>> I have index which was created outside solr. The index is fairly
>> simple with two field - document_id  & content. The query result needs
>> to return all the document IDs. The result need not be ordered by the
>> score. For this, in Lucene, I use custom hit collector with search to
>> get results quickly. The index has a few million documents and queries
>> returning hundreds of thousands of documents are not uncommon. So, the
>> speed is crucial here.
>>
>> Since retrieving the document_id for each document is slow, I am using
>> FileldCache to store the values of document_id. For all the results
>> collected (in a bitset) with hit collector, document_id field is
>> retrieved from the fieldcache.
>>
>> 1. How can I effectively disable scoring? I have read that
>> ConstantScoreQuery is quite fast, but from the code, I see that it is
>> used only for wildcard queries. How can I use ConstantScoreQuery for
>> all the queries (boolean, term, phrase, ..)?  Also, is
>> ConstantScoreQuery as fast as a custom hit collector?
>>
>> 2. How can Solr take advantage of the fieldcache while returning the
>> field document_id? The documentation says, fieldcache can be
>> explicitly auto warmed with Solr.  If fieldcache is available and
>> initialized at the beginning, will solr look into the cache to
>> retrieve the fields to be returned?
>>
>> 3. If there is an additional field for stemmed_content on which search
>> needs to use different analyzer, I suppose, that could be specified by
>> fieldType attribute in the schema.
>>
>> Thank you,
>>
>> --shashi
>>
>