You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by dbejean <do...@eolya.fr> on 2010/03/08 11:14:16 UTC

More contextual information in analyser

Hello,

If I write a custom analyser that accept a specific attribut in the
constructor

public MyCustomAnalyzer(String myAttribute);

Is there a way to dynamically send a value for this attribute from Solr at
index time in the XML Message ?

<add>
  <doc>
    <field name="content" myattribute="...">.....</field>


Obviously, in Sorl shema.xml, the "content" field is associated to my custom
Analyser.

Thank you.

Dominique

-- 
View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More contextual information in anlyzers

Posted by dbejean <do...@eolya.fr>.
So, the way I made my analyzer is the good one. Thank you.


hossman wrote:
> 
> 
> : If I write a custom analyser that accept a specific attribut in the
> : constructor
> : 
> : public MyCustomAnalyzer(String myAttribute);
> : 
> : Is there a way to dynamically send a value for this attribute from Solr
> at
> : index time in the XML Message ?
> : 
> : <add>
> :   <doc>
> :     <field name="content" myattribute="...">.....</field>
> 
> fundementally there are two problems with trying to add functionality like 
> this into Solr...
> 
> 1) the XML Update syntax is just *one* of several differnet pathways that 
> data can make it into Solr, and well before it reaches your custom 
> analyzer, it's converted into what is essentially just a list of triplets 
> (fieldName,fieldvalue,boost).  So it would be hard to generalize out 
> additional metadata attributes associated with field values in a way that 
> could be generalized.
> 
> 2) In Solr (and in Lucene in general) you don't get a seperate ANalyzer 
> instance per field/value pair -- one Analyzer is reused over and over for 
> every field=>value in a doc (and in fact: the same analyzer is used over 
> and over for every document as well)
> 
> This is why people typically encode their "attributes" in the value, and 
> then write their Tokenizers in such a way that it decodes that info and 
> stores it as a Payload on the terms -- because even if you bypassed Solr's 
> pipeline for adding documents directly from some custom RequestHandler 
> that knew about your extended XML syntax, there wouldn't be anyway to pass 
> that metadata to the (Long lived) Analyzer instance.
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27845893.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More contextual information in anlyzers

Posted by Chris Hostetter <ho...@fucit.org>.
: If I write a custom analyser that accept a specific attribut in the
: constructor
: 
: public MyCustomAnalyzer(String myAttribute);
: 
: Is there a way to dynamically send a value for this attribute from Solr at
: index time in the XML Message ?
: 
: <add>
:   <doc>
:     <field name="content" myattribute="...">.....</field>

fundementally there are two problems with trying to add functionality like 
this into Solr...

1) the XML Update syntax is just *one* of several differnet pathways that 
data can make it into Solr, and well before it reaches your custom 
analyzer, it's converted into what is essentially just a list of triplets 
(fieldName,fieldvalue,boost).  So it would be hard to generalize out 
additional metadata attributes associated with field values in a way that 
could be generalized.

2) In Solr (and in Lucene in general) you don't get a seperate ANalyzer 
instance per field/value pair -- one Analyzer is reused over and over for 
every field=>value in a doc (and in fact: the same analyzer is used over 
and over for every document as well)

This is why people typically encode their "attributes" in the value, and 
then write their Tokenizers in such a way that it decodes that info and 
stores it as a Payload on the terms -- because even if you bypassed Solr's 
pipeline for adding documents directly from some custom RequestHandler 
that knew about your extended XML syntax, there wouldn't be anyway to pass 
that metadata to the (Long lived) Analyzer instance.



-Hoss


Re: More contextual information in analyser

Posted by dbejean <do...@eolya.fr>.
It is true I need also this metadata at query time. For the moment, I put
this extra information at the beginning of the data too be indexed and at
the beginning of the query. It works, but I really don't like this. In my
case, I need the language of the data to be index and the language of the
query.

The goal is to dynamically use the correct chain of tokenizers and filters
according to the language and so use only one field in my index for all
languages.



Lance Norskog-2 wrote:
> 
> This is an interesting idea. There are other projects to make the
> analyzer/filter chain more "porous", or open to outside interaction.
> 
> A big problem is that queries are analyzed, too. If you want to give
> the same metadata to the analyzer when doing a query against the
> field, things get tough. You would need a special query parser to
> implement your own syntax to do that. However, the analyzer chain in
> the query phase does not receive the parsed query, so you have to in
> some way change this.
> 
> On Mon, Mar 8, 2010 at 2:14 AM, dbejean <do...@eolya.fr> wrote:
>>
>> Hello,
>>
>> If I write a custom analyser that accept a specific attribut in the
>> constructor
>>
>> public MyCustomAnalyzer(String myAttribute);
>>
>> Is there a way to dynamically send a value for this attribute from Solr
>> at
>> index time in the XML Message ?
>>
>> <add>
>>  <doc>
>>    <field name="content" myattribute="...">.....</field>
>>
>>
>> Obviously, in Sorl shema.xml, the "content" field is associated to my
>> custom
>> Analyser.
>>
>> Thank you.
>>
>> Dominique
>>
>> --
>> View this message in context:
>> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 
> 

-- 
View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27831948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More contextual information in analyser

Posted by Lance Norskog <go...@gmail.com>.
Yes, payloads should do this.

On Mon, Mar 8, 2010 at 8:29 PM, Jon Baer <jo...@gmail.com> wrote:
> Isn't this what Lucene/Solr payloads are theoretically for?
>
> ie: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
>
> - Jon
>
> On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote:
>
>> This is an interesting idea. There are other projects to make the
>> analyzer/filter chain more "porous", or open to outside interaction.
>>
>> A big problem is that queries are analyzed, too. If you want to give
>> the same metadata to the analyzer when doing a query against the
>> field, things get tough. You would need a special query parser to
>> implement your own syntax to do that. However, the analyzer chain in
>> the query phase does not receive the parsed query, so you have to in
>> some way change this.
>>
>> On Mon, Mar 8, 2010 at 2:14 AM, dbejean <do...@eolya.fr> wrote:
>>>
>>> Hello,
>>>
>>> If I write a custom analyser that accept a specific attribut in the
>>> constructor
>>>
>>> public MyCustomAnalyzer(String myAttribute);
>>>
>>> Is there a way to dynamically send a value for this attribute from Solr at
>>> index time in the XML Message ?
>>>
>>> <add>
>>>  <doc>
>>>    <field name="content" myattribute="...">.....</field>
>>>
>>>
>>> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
>>> Analyser.
>>>
>>> Thank you.
>>>
>>> Dominique
>>>
>>> --
>>> View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: More contextual information in analyser

Posted by Jon Baer <jo...@gmail.com>.
Isn't this what Lucene/Solr payloads are theoretically for?

ie: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

- Jon

On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote:

> This is an interesting idea. There are other projects to make the
> analyzer/filter chain more "porous", or open to outside interaction.
> 
> A big problem is that queries are analyzed, too. If you want to give
> the same metadata to the analyzer when doing a query against the
> field, things get tough. You would need a special query parser to
> implement your own syntax to do that. However, the analyzer chain in
> the query phase does not receive the parsed query, so you have to in
> some way change this.
> 
> On Mon, Mar 8, 2010 at 2:14 AM, dbejean <do...@eolya.fr> wrote:
>> 
>> Hello,
>> 
>> If I write a custom analyser that accept a specific attribut in the
>> constructor
>> 
>> public MyCustomAnalyzer(String myAttribute);
>> 
>> Is there a way to dynamically send a value for this attribute from Solr at
>> index time in the XML Message ?
>> 
>> <add>
>>  <doc>
>>    <field name="content" myattribute="...">.....</field>
>> 
>> 
>> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
>> Analyser.
>> 
>> Thank you.
>> 
>> Dominique
>> 
>> --
>> View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Re: More contextual information in analyser

Posted by Lance Norskog <go...@gmail.com>.
This is an interesting idea. There are other projects to make the
analyzer/filter chain more "porous", or open to outside interaction.

A big problem is that queries are analyzed, too. If you want to give
the same metadata to the analyzer when doing a query against the
field, things get tough. You would need a special query parser to
implement your own syntax to do that. However, the analyzer chain in
the query phase does not receive the parsed query, so you have to in
some way change this.

On Mon, Mar 8, 2010 at 2:14 AM, dbejean <do...@eolya.fr> wrote:
>
> Hello,
>
> If I write a custom analyser that accept a specific attribut in the
> constructor
>
> public MyCustomAnalyzer(String myAttribute);
>
> Is there a way to dynamically send a value for this attribute from Solr at
> index time in the XML Message ?
>
> <add>
>  <doc>
>    <field name="content" myattribute="...">.....</field>
>
>
> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
> Analyser.
>
> Thank you.
>
> Dominique
>
> --
> View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com