You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tiernan OToole <ls...@gmail.com> on 2011/11/11 11:09:56 UTC

getting solr to expand Acronym

Dont know if this is posible, but  i need to ask anyway... Say we have a
list of Acronyms in a database (CD, DVD, CPU) and also a list of their not
so short names (Compact Disk, Digital Versitile Disk, Central Processing
Unit) but they are not linked in any particular way (lost of items, some
with full names, some using anronyms), is it posible for Solr to figure out
CD is an Acronym of Compact Disk? I know CD could also mean Central Data,
or anything that beings with C and D, but is there a way to tell solr to
look for items that not only match CD, but have words next to each other
that begin with C and D... Another example i can think of is IBM: It could
be International Business Machines, or Irish Business Machines, or Irish
Banking Machines...

So, would that be posible?

-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie

Re: getting solr to expand Acronym

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Tiernan,

I don't think you can do it through any Solr configs, though I imagine you could do it with a custom Token Filter that keeps track of the context.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Tiernan OToole <ls...@gmail.com>
>To: solr-user@lucene.apache.org
>Sent: Friday, November 11, 2011 5:09 AM
>Subject: getting solr to expand Acronym
>
>Dont know if this is posible, but  i need to ask anyway... Say we have a
>list of Acronyms in a database (CD, DVD, CPU) and also a list of their not
>so short names (Compact Disk, Digital Versitile Disk, Central Processing
>Unit) but they are not linked in any particular way (lost of items, some
>with full names, some using anronyms), is it posible for Solr to figure out
>CD is an Acronym of Compact Disk? I know CD could also mean Central Data,
>or anything that beings with C and D, but is there a way to tell solr to
>look for items that not only match CD, but have words next to each other
>that begin with C and D... Another example i can think of is IBM: It could
>be International Business Machines, or Irish Business Machines, or Irish
>Banking Machines...
>
>So, would that be posible?
>
>-- 
>Tiernan O'Toole
>blog.lotas-smartman.net
>www.geekphotographer.com
>www.tiernanotoole.ie
>
>
>

Re: getting solr to expand Acronym

Posted by lboutros <bo...@gmail.com>.
Hi,

I'm not sure to see what you mean, but perhaps synonyms could solve your
problem ?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Ludovic.

-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/getting-solr-to-expand-Acronym-tp3499284p3500501.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: getting solr to expand Acronym

Posted by Tiernan OToole <ls...@gmail.com>.
thanks for the replies... the problem with Synonyms is that they would need
to be tracked... there could be new words entered that will need to be
added to the list on a regular basis...

@Otis: As for the option of a custom TokenFilter, how would that work? i
have not coded anything into Solr or any custom TokenFilters my self... I
am sure theres documentation on this, but how would you think this should
work?

Thanks.

--Tiernan


On Fri, Nov 11, 2011 at 9:01 PM, Brandon Ramirez <
Brandon_Ramirez@elementk.com> wrote:

> Could this be simulated through synonyms?  Could you define "CD" as a
> synonym of "Compact Disc" or vice versa?  I'm not sure if that would work,
> just brainstorming here...
>
>
> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848
> Software Engineer II | Element K | www.elementk.com
>
>
> -----Original Message-----
> From: Tiernan OToole [mailto:lsmartman@gmail.com]
> Sent: Friday, November 11, 2011 5:10 AM
> To: solr-user@lucene.apache.org
> Subject: getting solr to expand Acronym
>
> Dont know if this is posible, but  i need to ask anyway... Say we have a
> list of Acronyms in a database (CD, DVD, CPU) and also a list of their not
> so short names (Compact Disk, Digital Versitile Disk, Central Processing
> Unit) but they are not linked in any particular way (lost of items, some
> with full names, some using anronyms), is it posible for Solr to figure out
> CD is an Acronym of Compact Disk? I know CD could also mean Central Data,
> or anything that beings with C and D, but is there a way to tell solr to
> look for items that not only match CD, but have words next to each other
> that begin with C and D... Another example i can think of is IBM: It could
> be International Business Machines, or Irish Business Machines, or Irish
> Banking Machines...
>
> So, would that be posible?
>
> --
> Tiernan O'Toole
> blog.lotas-smartman.net
> www.geekphotographer.com
> www.tiernanotoole.ie
>



-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie

RE: getting solr to expand Acronym

Posted by Brandon Ramirez <Br...@elementk.com>.
Could this be simulated through synonyms?  Could you define "CD" as a synonym of "Compact Disc" or vice versa?  I'm not sure if that would work, just brainstorming here...


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-----Original Message-----
From: Tiernan OToole [mailto:lsmartman@gmail.com] 
Sent: Friday, November 11, 2011 5:10 AM
To: solr-user@lucene.apache.org
Subject: getting solr to expand Acronym

Dont know if this is posible, but  i need to ask anyway... Say we have a list of Acronyms in a database (CD, DVD, CPU) and also a list of their not so short names (Compact Disk, Digital Versitile Disk, Central Processing
Unit) but they are not linked in any particular way (lost of items, some with full names, some using anronyms), is it posible for Solr to figure out CD is an Acronym of Compact Disk? I know CD could also mean Central Data, or anything that beings with C and D, but is there a way to tell solr to look for items that not only match CD, but have words next to each other that begin with C and D... Another example i can think of is IBM: It could be International Business Machines, or Irish Business Machines, or Irish Banking Machines...

So, would that be posible?

--
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie