You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Villemos, Gert" <ge...@logica.com> on 2009/09/05 22:21:01 UTC

Concept Expansion

We would like to support concept expansion in searches, i.e. when a user searches for 'software' then the system should also search for keywords / phrases such as program <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=software+program> , computer <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=computer+software> , system <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=software+system> , package <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=package>  and class.
 
I imagine that the right way of doing this is a request handler, which expands a query into its conceptual similar entries and aggregates the results. A simple change in the filter from;
 
q:software => q:software OR program OR computer OR system OR package 
 
would most likely do the job.
 
Does such a request handler already exist (... looking at the list on the wiki and in the javadocs the answer seems to be no, but maybe its maintained externally)?
 
And is this the right way to go at all?
 
Thanks,
Gert.


Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


Re: AW: AW: Concept Expansion

Posted by gdeconto <ge...@topproducer.com>.
i had a similar question in my post 
http://www.nabble.com/forum/ViewPost.jtp?post=25752898&framed=y
http://www.nabble.com/forum/ViewPost.jtp?post=25752898&framed=y 

since queries can be quite complex, how would we parse the q string so that
we could identify and expand specific terms (ie is there an existing method)
in a custom QParserPlugin?



polx wrote:
> 
> 
> Le 05-sept.-09 à 23:26, Villemos, Gert a écrit :
> 
>> - As part of the construction the plugin parses the q string and  
>> extracts the parameters, ading them as TermQuery(s) to the parser
>  
> 

-- 
View this message in context: http://www.nabble.com/TermsComponent-tp25302503p25754730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: Concept Expansion

Posted by Paul Libbrecht <pa...@activemath.org>.
Le 05-sept.-09 à 23:26, Villemos, Gert a écrit :

> - The QParserPlugin is a factory for the actual QParser parser, i.e.  
> based on the query string and other parameters a parser is  
> instantiated and setup.

right.

> - As part of the construction the plugin parses the q string and  
> extracts the parameters, ading them as TermQuery(s) to the parser.

I think that's correct.

> - A 'concept expansion' extension could simply be a QParserPlugin  
> specialization, which as part of the 'createParser' method expands  
> the terms in the q string, i.e. 'replace' the input 'q=software'  
> with 'q=software OR program OR computer OR system OR package'.

Exactly.
The fact that you can master all the query classes is good luxury  
also, e.g. to do fine-grained queries without being worried about  
escapes by using once again a query-parser down the chain.

paul

AW: AW: Concept Expansion

Posted by "Villemos, Gert" <ge...@logica.com>.
Paul,
 
Thanks for the answer. Documentation on QParserPlugins concepts seems to be limited (well, at least my search didnt find it and the java doc doesnt provide much of an explanation).
 
Do I understand the concepts / your suggestion correctly;
 
- The QParserPlugin is a factory for the actual QParser parser, i.e. based on the query string and other parameters a parser is instantiated and setup.
- As part of the construction the plugin parses the q string and extracts the parameters, ading them as TermQuery(s) to the parser.
- A 'concept expansion' extension could simply be a QParserPlugin specialization, which as part of the 'createParser' method expands the terms in the q string, i.e. 'replace' the input 'q=software' with 'q=software OR program OR computer OR system OR package'.
 
Cheers,
Gert.
 
 
 

________________________________

Von: Paul Libbrecht [mailto:paul@activemath.org]
Gesendet: Sa 05.09.2009 23:03
An: solr-user@lucene.apache.org
Betreff: Re: AW: Concept Expansion



Gert,

we're doing a similar process on i2geo search, including simple 
language expansion (one word is queried in several fields of each 
language), and, though I haven't made it yet but will soon, I've been 
suggested to do it as qparser plugin.

paul


Le 05-sept.-09 à 22:47, Villemos, Gert a écrit :

> [Sorry, post submitted as HTML. Proper format below;]
>
>
> We would like to support concept expansion in searches, i.e. when a 
> user searches for 'software' then the system should also search for 
> keywords / phrases such as program, computer , system, package and 
> class.
>
> I imagine that the right way of doing this is a request handler, 
> which expands a query into its conceptual similar entries and 
> aggregates the results. A simple change in the filter from;
>
> q:software => 
>
> would most likely do the job.
>
> Does such a request handler already exist (... looking at the list 
> on the wiki and in the javadocs the answer seems to be no, but maybe 
> its maintained externally)?
>
> And is this the right way to go at all?
>
> Thanks,
> Gert.
>
>





Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


Re: AW: Concept Expansion

Posted by Paul Libbrecht <pa...@activemath.org>.
Gert,

we're doing a similar process on i2geo search, including simple  
language expansion (one word is queried in several fields of each  
language), and, though I haven't made it yet but will soon, I've been  
suggested to do it as qparser plugin.

paul


Le 05-sept.-09 à 22:47, Villemos, Gert a écrit :

> [Sorry, post submitted as HTML. Proper format below;]
>
>
> We would like to support concept expansion in searches, i.e. when a  
> user searches for 'software' then the system should also search for  
> keywords / phrases such as program, computer , system, package and  
> class.
>
> I imagine that the right way of doing this is a request handler,  
> which expands a query into its conceptual similar entries and  
> aggregates the results. A simple change in the filter from;
>
> q:software => q:software OR program OR computer OR system OR package
>
> would most likely do the job.
>
> Does such a request handler already exist (... looking at the list  
> on the wiki and in the javadocs the answer seems to be no, but maybe  
> its maintained externally)?
>
> And is this the right way to go at all?
>
> Thanks,
> Gert.
>
>


Re: AW: Concept Expansion

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Villemos, Gert wrote:
> Well, this is very interesting.
>  
> Looking at the documentation provided in the link it seems like the synonym definitions must be in a file. We would define the concept expansions in another format. My question is thus; Is it possible to perform a synonym replacement based on not the file but another mechanism?
>  
> I guess no. The answer would thus be to create new TokenFilters and coresponding factory, and implement it to access our format. Right?
>   
I've never tried but I think you can implement TokenFilterFactory that 
accesses your format,
creates SynonymMap and passes it to SynonymFilter.

>  
> Would there be a way to enable / disable the expasion filter at runtime, i.e. for example through special parameters in the query sring?
>  
>   
No. SynonymFilter works for specific fields as you defined in schema.xml.

Koji



AW: Concept Expansion

Posted by "Villemos, Gert" <ge...@logica.com>.
Well, this is very interesting.
 
Looking at the documentation provided in the link it seems like the synonym definitions must be in a file. We would define the concept expansions in another format. My question is thus; Is it possible to perform a synonym replacement based on not the file but another mechanism?
 
I guess no. The answer would thus be to create new TokenFilters and coresponding factory, and implement it to access our format. Right?
 
Would there be a way to enable / disable the expasion filter at runtime, i.e. for example through special parameters in the query sring?
 
Cheers,
Gert.
 
 
 
 

________________________________

Von: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
Gesendet: Sa 05.09.2009 23:23
An: solr-user@lucene.apache.org
Betreff: Re: Concept Expansion



On Sun, Sep 6, 2009 at 2:17 AM, Villemos, Gert <ge...@logica.com>wrote:

>
> We would like to support concept expansion in searches, i.e. when a user
> searches for 'software' then the system should also search for keywords /
> phrases such as program, computer , system, package and class.
>
> I imagine that the right way of doing this is a request handler, which
> expands a query into its conceptual similar entries and aggregates the
> results.
>

Have you looked at SynonymFilterFactory?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

--
Regards,
Shalin Shekhar Mangar.




Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


Re: Concept Expansion

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sun, Sep 6, 2009 at 2:17 AM, Villemos, Gert <ge...@logica.com>wrote:

>
> We would like to support concept expansion in searches, i.e. when a user
> searches for 'software' then the system should also search for keywords /
> phrases such as program, computer , system, package and class.
>
> I imagine that the right way of doing this is a request handler, which
> expands a query into its conceptual similar entries and aggregates the
> results.
>

Have you looked at SynonymFilterFactory?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

-- 
Regards,
Shalin Shekhar Mangar.

AW: Concept Expansion

Posted by "Villemos, Gert" <ge...@logica.com>.
[Sorry, post submitted as HTML. Proper format below;]
 
 
We would like to support concept expansion in searches, i.e. when a user searches for 'software' then the system should also search for keywords / phrases such as program, computer , system, package and class.

I imagine that the right way of doing this is a request handler, which expands a query into its conceptual similar entries and aggregates the results. A simple change in the filter from;

q:software => q:software OR program OR computer OR system OR package

would most likely do the job.

Does such a request handler already exist (... looking at the list on the wiki and in the javadocs the answer seems to be no, but maybe its maintained externally)?

And is this the right way to go at all?

Thanks,
Gert.

 

________________________________

Von: Villemos, Gert [mailto:gert.villemos@logica.com]
Gesendet: Sa 05.09.2009 22:21
An: solr-user@lucene.apache.org
Betreff: Concept Expansion



We would like to support concept expansion in searches, i.e. when a user searches for 'software' then the system should also search for keywords / phrases such as program <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=software+program> , computer <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=computer+software> , system <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=software+system> , package <https://owa.de.logica.com/Exchange/176077/Drafts/AW:%20TermsComponent.EML/webwn?o2=&o0=1&o7=&o5=&o1=1&o6=&o4=&o3=&s=package>  and class.

I imagine that the right way of doing this is a request handler, which expands a query into its conceptual similar entries and aggregates the results. A simple change in the filter from;

q:software => q:software OR program OR computer OR system OR package

would most likely do the job.

Does such a request handler already exist (... looking at the list on the wiki and in the javadocs the answer seems to be no, but maybe its maintained externally)?

And is this the right way to go at all?

Thanks,
Gert.


Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.





Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


Re: Concept Expansion

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Concept Expansion
: References:
:     <84...@mail.gmail.com><c68e39170909
:     050858i5b6bc063o79c1eac06e1c58b4@mail.gmail.com>
:     <84...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking


-Hoss