You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Claudio Martella <cl...@tis.bz.it> on 2009/10/01 13:21:47 UTC

Query filters/analyzers

Hello list.

So, i setup my schema.xml with the different chains of analyzers and
filters for each field (i.e. i created types text-en, text-de, text-it). 
As i have to index documents in different languages, this is good. 
But what defines the analyzers and filters for the query? 

Let's suppose i have my web-app with my input form where you
fill in the query. I detect the language so i can query the field
content-en or content-it or content-de according to the detection.
But how is the query going to be analyzed? Of course i want the query to
be analyzed accordingly to the field i'm going to search in.
Where is this defined?

TIA

Claudio

-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.



Re: Query filters/analyzers

Posted by Claudio Martella <cl...@tis.bz.it>.
Thanks, that's exactly the kind of answer I was looking for.


Chantal Ackermann wrote:
> Hi Claudio,
>
> in schema.xml, the <analyzer> element accepts the attribute type.
> If you need different analyzer chains during indexing and querying,
> configure it like this:
>
> <fieldType name="channel_name" class="solr.TextField">
>     <analyzer type="index">
>     <!-- indexing analyzer chain defined here -->
>     </analyzer>
>     <analyzer type="query">
>     <!-- query analyzer chain defined here -->
>     </analyzer>
> </fieldType>
>
> If there is no difference, just remove one analyzer element and the
> type attribute from the remaining one.
>
> You can check after indexing in the schema browser (admin web
> frontend) what analyzer chain is applied for indexing and querying on
> a certain field.
>
> When you have detected the input language, simply choose the correct
> field, and the configured analyzer chain for that field will be
> applied automatically.
>
> E.g. input is italian:
> q=text-it:input
>
> text-it has the italian analyzers configured for index and query, so
> to the input, the italian analyzers will also be applied.
>
> Cheers,
> Chantal
>
> Claudio Martella schrieb:
>> Hello list.
>>
>> So, i setup my schema.xml with the different chains of analyzers and
>> filters for each field (i.e. i created types text-en, text-de, text-it).
>> As i have to index documents in different languages, this is good.
>> But what defines the analyzers and filters for the query?
>>
>> Let's suppose i have my web-app with my input form where you
>> fill in the query. I detect the language so i can query the field
>> content-en or content-it or content-de according to the detection.
>> But how is the query going to be analyzed? Of course i want the query to
>> be analyzed accordingly to the field i'm going to search in.
>> Where is this defined?
>>
>> TIA
>>
>> Claudio
>>
>> -- 
>> Claudio Martella
>> Digital Technologies
>> Unit Research & Development - Engineer
>>
>> TIS innovation park
>> Via Siemens 19 | Siemensstr. 19
>> 39100 Bolzano | 39100 Bozen
>> Tel. +39 0471 068 123
>> Fax  +39 0471 068 129
>> claudio.martella@tis.bz.it http://www.tis.bz.it
>>
>> Short information regarding use of personal data. According to
>> Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we
>> inform you that we process your personal data in order to fulfil
>> contractual and fiscal obligations and also to send you information
>> regarding our services and events. Your personal data are processed
>> with and without electronic means and by respecting data subjects'
>> rights, fundamental freedoms and dignity, particularly with regard to
>> confidentiality, personal identity and the right to personal data
>> protection. At any time and without formalities you can write an
>> e-mail to privacy@tis.bz.it in order to object the processing of your
>> personal data for the purpose of sending advertising materials and
>> also to exercise the right to access personal data and other rights
>> referred to in Section 7 of Decree 196/2003. The data controller is
>> TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You
>> can find the complete information on the web
> site www.tis.bz.it.
>>
>>
>


-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.



Re: Query filters/analyzers

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Oct 2, 2009 at 6:44 PM, Fergus McMenemie <fe...@twig.me.uk> wrote:

> >The copy is done before analysis. The original text is sent to the
> copyField
> >which can choose to do analysis differently from the source field.
> >
> I have been wondering about this as well. The WIKI is not explicit about
> what happens. Is this correct:-
>
> "The original text is sent to the copyField, before any configured
> analyzers for the originating or destination field are invoked."
>
>
Yes, that is correct.


> is so, I will tweak the wiki!
>
>
Please do!

-- 
Regards,
Shalin Shekhar Mangar.

Re: Query filters/analyzers

Posted by Fergus McMenemie <fe...@twig.me.uk>.
>On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella <claudio.martella@tis.bz.it
>> wrote:
>
>>
>> About the copyField issue in general: as it copies the content to the
>> other field, what is the sense to define analyzers for the destination
>> field? The source is already analyzed so i guess that the RESULT of the
>> analysis is copied there.
>
>
>The copy is done before analysis. The original text is sent to the copyField
>which can choose to do analysis differently from the source field.
>
I have been wondering about this as well. The WIKI is not explicit about
what happens. Is this correct:-

"The original text is sent to the copyField, before any configured
analyzers for the originating or destination field are invoked."

is so, I will tweak the wiki!

Regds Fergus.
-- 

Re: Query filters/analyzers

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella <claudio.martella@tis.bz.it
> wrote:

>
> About the copyField issue in general: as it copies the content to the
> other field, what is the sense to define analyzers for the destination
> field? The source is already analyzed so i guess that the RESULT of the
> analysis is copied there.


The copy is done before analysis. The original text is sent to the copyField
which can choose to do analysis differently from the source field.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Query filters/analyzers

Posted by Claudio Martella <cl...@tis.bz.it>.
Ok, one more question on this issue. I used to have an "all" field where
i used to copyField "title" "content" and "keywords" defined with
typeField "text", which used to have english-language dependant
analyzers/filters. Now I can copyField all the three "content-*" fields
as I know that only one of the three will be filled per document. My
problem is once again that i have to define a typeField for this "all"
that should be language-independant.
The solution is once again to create three "all" fields or to create
only one defined as text-ws (no language-dependant analysis). But in the
latter case it would be desynched with the "content-*" fields which are
stemmed and stopped.

About the copyField issue in general: as it copies the content to the
other field, what is the sense to define analyzers for the destination
field? The source is already analyzed so i guess that the RESULT of the
analysis is copied there. In this case a text-ws should be sufficient.
But then i guess the problem is again with the QUERY time analysis. Right?


Chantal Ackermann wrote:
> Hi Claudio,
>
> in schema.xml, the <analyzer> element accepts the attribute type.
> If you need different analyzer chains during indexing and querying,
> configure it like this:
>
> <fieldType name="channel_name" class="solr.TextField">
>     <analyzer type="index">
>     <!-- indexing analyzer chain defined here -->
>     </analyzer>
>     <analyzer type="query">
>     <!-- query analyzer chain defined here -->
>     </analyzer>
> </fieldType>
>
> If there is no difference, just remove one analyzer element and the
> type attribute from the remaining one.
>
> You can check after indexing in the schema browser (admin web
> frontend) what analyzer chain is applied for indexing and querying on
> a certain field.
>
> When you have detected the input language, simply choose the correct
> field, and the configured analyzer chain for that field will be
> applied automatically.
>
> E.g. input is italian:
> q=text-it:input
>
> text-it has the italian analyzers configured for index and query, so
> to the input, the italian analyzers will also be applied.
>
> Cheers,
> Chantal
>
> Claudio Martella schrieb:
>> Hello list.
>>
>> So, i setup my schema.xml with the different chains of analyzers and
>> filters for each field (i.e. i created types text-en, text-de, text-it).
>> As i have to index documents in different languages, this is good.
>> But what defines the analyzers and filters for the query?
>>
>> Let's suppose i have my web-app with my input form where you
>> fill in the query. I detect the language so i can query the field
>> content-en or content-it or content-de according to the detection.
>> But how is the query going to be analyzed? Of course i want the query to
>> be analyzed accordingly to the field i'm going to search in.
>> Where is this defined?
>>
>> TIA
>>
>> Claudio
>>
>> -- 
>> Claudio Martella
>> Digital Technologies
>> Unit Research & Development - Engineer
>>
>> TIS innovation park
>> Via Siemens 19 | Siemensstr. 19
>> 39100 Bolzano | 39100 Bozen
>> Tel. +39 0471 068 123
>> Fax  +39 0471 068 129
>> claudio.martella@tis.bz.it http://www.tis.bz.it
>>
>> Short information regarding use of personal data. According to
>> Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we
>> inform you that we process your personal data in order to fulfil
>> contractual and fiscal obligations and also to send you information
>> regarding our services and events. Your personal data are processed
>> with and without electronic means and by respecting data subjects'
>> rights, fundamental freedoms and dignity, particularly with regard to
>> confidentiality, personal identity and the right to personal data
>> protection. At any time and without formalities you can write an
>> e-mail to privacy@tis.bz.it in order to object the processing of your
>> personal data for the purpose of sending advertising materials and
>> also to exercise the right to access personal data and other rights
>> referred to in Section 7 of Decree 196/2003. The data controller is
>> TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You
>> can find the complete information on the web
> site www.tis.bz.it.
>>
>>
>


-- 
Claudio Martella
Digital Technologies
Unit Research & Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.martella@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.



Re: Query filters/analyzers

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Claudio,

in schema.xml, the <analyzer> element accepts the attribute type.
If you need different analyzer chains during indexing and querying, 
configure it like this:

<fieldType name="channel_name" class="solr.TextField">
	<analyzer type="index">
	<!-- indexing analyzer chain defined here -->
	</analyzer>
	<analyzer type="query">
	<!-- query analyzer chain defined here -->
	</analyzer>
</fieldType>

If there is no difference, just remove one analyzer element and the type 
attribute from the remaining one.

You can check after indexing in the schema browser (admin web frontend) 
what analyzer chain is applied for indexing and querying on a certain field.

When you have detected the input language, simply choose the correct 
field, and the configured analyzer chain for that field will be applied 
automatically.

E.g. input is italian:
q=text-it:input

text-it has the italian analyzers configured for index and query, so to 
the input, the italian analyzers will also be applied.

Cheers,
Chantal

Claudio Martella schrieb:
> Hello list.
> 
> So, i setup my schema.xml with the different chains of analyzers and
> filters for each field (i.e. i created types text-en, text-de, text-it).
> As i have to index documents in different languages, this is good.
> But what defines the analyzers and filters for the query?
> 
> Let's suppose i have my web-app with my input form where you
> fill in the query. I detect the language so i can query the field
> content-en or content-it or content-de according to the detection.
> But how is the query going to be analyzed? Of course i want the query to
> be analyzed accordingly to the field i'm going to search in.
> Where is this defined?
> 
> TIA
> 
> Claudio
> 
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Engineer
> 
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> claudio.martella@tis.bz.it http://www.tis.bz.it
> 
> Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to privacy@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web
 site www.tis.bz.it.
> 
>