You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mdz-munich <se...@bsb-muenchen.de> on 2010/12/13 14:23:46 UTC
Query-Expansion, copyFields, flexibility and size of Index
(Solr-3.1-SNAPSHOT)
Hi all,
we want to do Query-Expansion with synonyms and word forms on Query-Time.
Assuming we want to query all fields (text & synonyms/word forms) with
different boosts with dismax, we need following setup (simplified):
<fieldType name="text" class="solr.TextField" positionIncrementGap="0"
sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
</fieldType>
<fieldType name="text_syn" class="solr.TextField" positionIncrementGap="0"
sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory synonyms="syn.txt"
ignoreCase="true" expand="true"/>
</analyzer>
</fieldType>
Further more, we need two fields:
<field name="fulltext" type="text" indexed="true" stored="true"
multiValued="true" />
<field name="fulltext_syn" type="text_syn" indexed="true" stored="false"
multiValued="true" />
Last but not least we have to copy the fulltext-field into our
fulltext_syn-field:
<copyField source="fulltext" dest="fulltext_syn" />
Now we can query both fields with "qt=dismax&q=searchterms&qf=fulltext^2.0
fulltext_syn^1.0" etc.
That seems to work out very well. But now comes the dark site of the force:
We quickly realized that every copyField-instruction causes into a full copy
of that field, even if the index-time-analyzer runs on both field-types
(text & text_syn) with a exact identical setup. The result is a 10% larger
index and further more a less flexible application, because for every
retrieval-functionality relating on query-expansion or other special
query-time-analyzing, we have to copy that field into a new field and have
to re-index the whole data.
We think about something like that:
<fieldType name="text" class="solr.TextField" positionIncrementGap="0"
sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
<analyzer type="query_syn">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory synonyms="syn.txt"
ignoreCase="true" expand="true"/>
</analyzer>
</fieldType>
Request like:
"qt=dismax&q=searchterms&qf=fulltext.ana.query^2.0
fulltext.ana.query_syn^1.0" etc.
That would be much more flexible, precisely because we wouldn't have to
re-index the whole data for every copyFields-instruction. And further more
it would decrease storage-consumption about 10%.
Any ideas on that? Any other solutions?
Best regards,
Sebastian from Munich, Bavarian, Germany
--
View this message in context: http://lucene.472066.n3.nabble.com/Query-Expansion-copyFields-flexibility-and-size-of-Index-Solr-3-1-SNAPSHOT-tp2078573p2078573.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query-Expansion, copyFields, flexibility and size of Index
(Solr-3.1-SNAPSHOT)
Posted by mdz-munich <se...@bsb-muenchen.de>.
Okay, I start guessing:
- Do we have to write a customized QueryParserPlugin?
- On which point does the RequestHandler/QueryParser/whatever decide what
query-analyzer to use?
10% for every copied field is a lot for us, we're facing Terra-bytes of
digitized Book-Data. So we want to keep the index simple, small and flexible
and just append IR-Functionalities on Query-Time.
Greetings & thank you,
Sebastian
--
View this message in context: http://lucene.472066.n3.nabble.com/Query-Expansion-copyFields-flexibility-and-size-of-Index-Solr-3-1-SNAPSHOT-tp2078573p2085018.html
Sent from the Solr - User mailing list archive at Nabble.com.