You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/11/03 21:09:11 UTC
Re: Dismax and phrases
Interesting, in the case where you use quotes...
: +<result name="response" numFound="6888" start="0" maxScore="3.0879765">
...
: </lst><str name="rawquerystring">"asuntojen hinnat"</str>
: <str name="querystring">"asuntojen hinnat"</str>
...there is one DisjunctionMaxQuery (expected) for the entire phrase,
but in the sub-clauses for each individual field the clauses coming from
your "_fi" fields are just building boolean "OR" queries of the terms from
your phrase (instead of building an actual phrase query...
: <str name="parsedquery">+DisjunctionMaxQuery((table.title_t:"asuntojen
: hinnat"^2.0 | title_t:"asuntojen hinnat"^2.0 | ingress_t:"asuntojen hinnat" |
: (text_fi:asunto text_fi:hinta) | (table.description_fi:asunto
: table.description_fi:hinta) | table.description_t:"asuntojen hinnat" |
: graphic.title_t:"asuntojen hinnat"^2.0 | ((graphic.title_fi:asunto
: graphic.title_fi:hinta)^2.0) | ((table.title_fi:asunto
: table.title_fi:hinta)^2.0) | table.contents_t:"asuntojen hinnat" |
: text_t:"asuntojen hinnat" | (ingress_fi:asunto ingress_fi:hinta) |
: (table.contents_fi:asunto table.contents_fi:hinta) | ((title_fi:asunto
: title_fi:hinta)^2.0))~0.01) () type:tie^6.0 type:kuv^2.0 type:tau^2.0
: FunctionQuery((1.0/(3.16E-11*float(ms(const(1319437912691),date(date.modified_dt)))+1.0))^100.0)</str>
...is this perhaps a side effect of the new autoGeneratePhraseQueries
option? ... you are explicitly specifying a quoted phrase, but
maybe somehwere in the code path of the dismax parser that information is
getting lost?
can you post the details of your schema.xml? (ie: the "version" property
on the schema file, and the dynamicField/field + fieldType definitions for
all these fields)
In contrast, your unquoted example is working exactly as i'd expect. a
DisjunctionMaxQuery is built for each clause of the input, and the two
DisjunctionMaxQuery objects are then combined in a BooleanQuery where the
minNrShouldMatch property is set to "2"....
: +<result name="response" numFound="1065" start="0"
: maxScore="2.230382"></result>
...
: <str name="rawquerystring">asuntojen hinnat</str>
: <str name="querystring">asuntojen hinnat</str>
:
: <str name="parsedquery">+((DisjunctionMaxQuery((table.title_t:asuntojen^2.0 |
: title_t:asuntojen^2.0 | ingress_t:asuntojen | text_fi:asunto |
: table.description_fi:asunto | table.description_t:asuntojen |
: graphic.title_t:asuntojen^2.0 | graphic.title_fi:asunto^2.0 |
: table.title_fi:asunto^2.0 | table.contents_t:asuntojen | text_t:asuntojen |
: ingress_fi:asunto | table.contents_fi:asunto | title_fi:asunto^2.0)~0.01)
: DisjunctionMaxQuery((table.title_t:hinnat^2.0 | title_t:hinnat^2.0 |
: ingress_t:hinnat | text_fi:hinta | table.description_fi:hinta |
: table.description_t:hinnat | graphic.title_t:hinnat^2.0 |
: graphic.title_fi:hinta^2.0 | table.title_fi:hinta^2.0 |
: table.contents_t:hinnat | text_t:hinnat | ingress_fi:hinta |
: table.contents_fi:hinta | title_fi:hinta^2.0)~0.01))~2) () type:tie^6.0
: type:kuv^2.0 type:tau^2.0
: FunctionQuery((1.0/(3.16E-11*float(ms(const(1319438484878),date(date.modified_dt)))+1.0))^100.0)</str>
-Hoss
Re: Dismax and phrases
Posted by Chris Hostetter <ho...@fucit.org>.
: ...is this perhaps a side effect of the new autoGeneratePhraseQueries
: option? ... you are explicitly specifying a quoted phrase, but
: maybe somehwere in the code path of the dismax parser that information is
: getting lost?
FWIW:
a) I just realized you said in your first message you were using Solr
1.4.1, which *definitely* predates the autoGeneratePhraseQueries option -
so i'm really at a loss to understand how you are getting that query
structure (definitely want to see your configs)
b) I did some quick testing with Solr 3.4 using the example configs, and
verified that regardless of how autoGeneratePhraseQueries is set on the
fieldType for the "name" field, this request...
http://localhost:8983/solr/select/?fl=name&debugQuery=true&q=%22samsung%20hard%20drive%22&defType=dismax&qf=name&qs=100
..always produces a dismax query wrapped arround a phrase query.
-Hoss
Re: Dismax and phrases
Posted by Chris Hostetter <ho...@fucit.org>.
: I am starting to wonder whether the module giving finnish language support
: (lingsoft) might be the cause?
It's extremeley possible -- the details relaly matter when debugging
things like this.
Since i don't have any access to these custom plugins, i don't know what
they might be doing, or how they might be affecting the terms produced
during analysis to explain why you are getting the structure you are --
but one explanation might be if every term produced by them gets a
positionIncrement of "0" ... that would tell the query parser to treat
them as alternatives -- it's the same thing SynonymFilter does.
you'd have to look at the output from the analysis tool ,feeding your
example input into the query analyzer to see what terms it produces (and
what attributes those terms have). if it is a position increment issue,
then you should see the same "OR" style query structure (instead of a
phrase query) even if you use the default "lucene" parser and give it a
quoted phrase...
text_fi:"asuntojen hinnat"
-Hoss
Re: Dismax and phrases
Posted by Hyttinen Lauri <la...@stat.fi>.
Hello,
I am starting to wonder whether the module giving finnish language
support (lingsoft) might be the cause?
Like I earlier said I have inherited this project so my understanding of
all the bells and whistles is a bit limited.
Some selected parts from the schema.xml file:
<schema name="example" version="1.2">
...
<fieldType name="suomi" class="solr.TextField" positionIncrementGap="100">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
/>
<filter class="lingSoft.LSFactory"/>
<filter class="solr.PositionFilterFactory" />
</analyzer>
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
/>
<filter class="lingSoft.LSFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
preserveOriginal="1"
/>
</analyzer>
</fieldType>
...
<field name="text_fi" type="suomi" indexed="true" stored="true"
multiValued="true" required="false" />
...
<dynamicField name="*_t" type="text" indexed="true" stored="true"
multiValued="true"/>
...
<!-- dynamic field for finnish language support with the lingsoft
transformation -->
<dynamicField name="*_fi" type="suomi" indexed="true" stored="true"
multiValued="true" />
....
<dynamicField name="ignored_*" type="ignored" multiValued="true"/>
<dynamicField name="attr_*" type="textgen" indexed="true" stored="true"
multiValued="true"/>
<dynamicField name="random_*" type="random" />
<dynamicField name="*" type="text" multiValued="true" index="true"
stored="true" />
Best regards,
Lauri Hyttinen
On 11/03/2011 10:09 PM, Chris Hostetter wrote:
> Interesting, in the case where you use quotes...
>
> : +<result name="response" numFound="6888" start="0" maxScore="3.0879765">
> ...
> :</lst><str name="rawquerystring">"asuntojen hinnat"</str>
> :<str name="querystring">"asuntojen hinnat"</str>
>
> ...there is one DisjunctionMaxQuery (expected) for the entire phrase,
> but in the sub-clauses for each individual field the clauses coming from
> your "_fi" fields are just building boolean "OR" queries of the terms from
> your phrase (instead of building an actual phrase query...
>
> :<str name="parsedquery">+DisjunctionMaxQuery((table.title_t:"asuntojen
> : hinnat"^2.0 | title_t:"asuntojen hinnat"^2.0 | ingress_t:"asuntojen hinnat" |
> : (text_fi:asunto text_fi:hinta) | (table.description_fi:asunto
> : table.description_fi:hinta) | table.description_t:"asuntojen hinnat" |
> : graphic.title_t:"asuntojen hinnat"^2.0 | ((graphic.title_fi:asunto
> : graphic.title_fi:hinta)^2.0) | ((table.title_fi:asunto
> : table.title_fi:hinta)^2.0) | table.contents_t:"asuntojen hinnat" |
> : text_t:"asuntojen hinnat" | (ingress_fi:asunto ingress_fi:hinta) |
> : (table.contents_fi:asunto table.contents_fi:hinta) | ((title_fi:asunto
> : title_fi:hinta)^2.0))~0.01) () type:tie^6.0 type:kuv^2.0 type:tau^2.0
> : FunctionQuery((1.0/(3.16E-11*float(ms(const(1319437912691),date(date.modified_dt)))+1.0))^100.0)</str>
>
> ...is this perhaps a side effect of the new autoGeneratePhraseQueries
> option? ... you are explicitly specifying a quoted phrase, but
> maybe somehwere in the code path of the dismax parser that information is
> getting lost?
>
> can you post the details of your schema.xml? (ie: the "version" property
> on the schema file, and the dynamicField/field + fieldType definitions for
> all these fields)
>
> In contrast, your unquoted example is working exactly as i'd expect. a
> DisjunctionMaxQuery is built for each clause of the input, and the two
> DisjunctionMaxQuery objects are then combined in a BooleanQuery where the
> minNrShouldMatch property is set to "2"....
>
> : +<result name="response" numFound="1065" start="0"
> : maxScore="2.230382"></result>
> ...
> :<str name="rawquerystring">asuntojen hinnat</str>
> :<str name="querystring">asuntojen hinnat</str>
> :
> :<str name="parsedquery">+((DisjunctionMaxQuery((table.title_t:asuntojen^2.0 |
> : title_t:asuntojen^2.0 | ingress_t:asuntojen | text_fi:asunto |
> : table.description_fi:asunto | table.description_t:asuntojen |
> : graphic.title_t:asuntojen^2.0 | graphic.title_fi:asunto^2.0 |
> : table.title_fi:asunto^2.0 | table.contents_t:asuntojen | text_t:asuntojen |
> : ingress_fi:asunto | table.contents_fi:asunto | title_fi:asunto^2.0)~0.01)
> : DisjunctionMaxQuery((table.title_t:hinnat^2.0 | title_t:hinnat^2.0 |
> : ingress_t:hinnat | text_fi:hinta | table.description_fi:hinta |
> : table.description_t:hinnat | graphic.title_t:hinnat^2.0 |
> : graphic.title_fi:hinta^2.0 | table.title_fi:hinta^2.0 |
> : table.contents_t:hinnat | text_t:hinnat | ingress_fi:hinta |
> : table.contents_fi:hinta | title_fi:hinta^2.0)~0.01))~2) () type:tie^6.0
> : type:kuv^2.0 type:tau^2.0
> : FunctionQuery((1.0/(3.16E-11*float(ms(const(1319438484878),date(date.modified_dt)))+1.0))^100.0)</str>
>
>
> -Hoss
>
--
Lauri Hyttinen
Tietopalvelusuunnittelija
Tilastokeskus
Yksikkö
Käyntiosoite: Työpajankatu 13, 00580 Helsinki
Postiosoite: PL 3 A, 00022 Tilastokeskus
puh. 09 1734 0000
lauri.hyttinen@tilastokeskus.fi
www.tilastokeskus.fi