You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Felipe Carvalho <fe...@gmail.com> on 2011/11/08 01:45:06 UTC

Phonetic search with Lucene 3.2

Hello,
  I'm using Lucene 3.2 on a phone book app and phonetic search is a
requirement. I've googled up "lucene phonetic search" but could not find
many references. I did find this article, but I'm not sure about how
updated it is: http://tech.javayogi.com/hello-world-lucene.html
  I couldn't find anything browsing on Lucene's docs or mail archives
either.
  I did find this improvement on Jira (
https://issues.apache.org/jira/browse/LUCENE-2413) but as far as I could
understand, it seems like phonetic capability is scheduled to be added to
lucene-core on 4.0 version only.
  Can anyone point to an example of phonetic indexing and searching? Should
I use Phonetix (
http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html)
stuff?

Thanks a lot,
  Felipe

Re: Phonetic search with Lucene 3.2

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Felipe,

I do not have a tutorial but what you are describing is what I have been doing in ActiveMath.

I have a little paper for you if you want that explains how it goes there (http://www.hoplahup.net/paul_pubs/AccessRetrievalAM.html)  and the software is open-source (http://www.activemath.org/Software/)

No tutorials however.
I would believe Solr with dismax would be the easiest way to start.

paul

Le 8 nov. 2011 à 14:42, Felipe Carvalho a écrit :

> On Tue, Nov 8, 2011 at 10:06 AM, Erik Hatcher <er...@gmail.com>wrote:
> 
>> 
>> On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:
>> 
>>> One other question: I'm looking at Lucene 3.4 javadocs (
>>> http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't
>> find
>>> MetaphoneReplacementAnalyzer anywhere. Does any one know if this class
>> has
>>> been removed from lucene-core.
>> 
>> That class is in Lucene in Action's companion code, not Lucene itself.
>> Download it from http://www.manning.com/lucene
>> 
>>> My Lucene In Action edition is from 2004, so I'm guessing things kinda
>>> changed since then.
>> 
>> There's a second edition out now, well worth getting if I do say so myself
>> :)  (I've learned a lot from reading and re-reading it myself, to be honest
>> - thanks MikeM!)
>> 
>>>> Now suppose my document had a particular field I don't want to be
>>>> metaphones one the search, for instance, "exactName". For example,
>> suppose
>>>> I want to look for all documents which contents phonetically match "kool
>>>> kat" and exactName match "kat" but not "cat", generating an expression
>> like
>>>> this: "exactName:kat AND contents:kool kat".
>>>> 
>>>> Is it possible to do this? If so, how would I do it? Can I use specific
>>>> analyzers for each field?
>> 
>> Yes, quite possible, including boosting on exact matches if you want.  Use
>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
>> without, including fields at indexing time for both too of course.
>> 
> 
> Would it be possible to point to an example where this is done. The best
> example of a BooleanQuery I've found so far is this one:
> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
> 
> But I couldn't find a boolean query using different analyzers for different
> fields of the attribute.
> 
> Thanks a lot!
> 
> 
>> 
>>       Erik
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Re: Phonetic search with Lucene 3.2

Posted by janwen <to...@163.com>.
paul:
   I visited the site you provide and explore the example,but i can not understand the results from the following example.I thought 
the phonetix is an tel-no analyzer like 010-134323434,400-2343-2342-3234,could you tell me some more info about this,i need to index phone number in my project.thanks.
public class PhoneticTest
{
    public static void main(String[] argv)
    {
        Metaphone metaphone = new Metaphone();

        String original = "Phonetix";
        String encoded  = metaphone.generateKey(original);

        System.out.println("original="+original+", encoded="+encoded);
    }
}
After compiling and running this example program you will get an output like this:
original=Phonetix, encoded=FNTK




2011-11-10



janwen | China 
website : http://www.qianpin.com/




发件人: Paul Libbrecht <pa...@hoplahup.net>
发送时间: 2011-11-09 18:38
主 题: Re: Phonetic search with Lucene 3.2
收件人: java-user@lucene.apache.org



We've been using 
    http://www.tangentum.biz/en/products/phonetix/ 
which does double-metaphone. 
Maybe that helps. 

paul 


Le 9 nov. 2011 à 11:29, Felipe Carvalho a écrit : 

> Using PerFieldAnalyzerWrapper seems to be working for what I need! 
>  
> On indexing: 
>  
>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new 
> StandardAnalyzer(Version.LUCENE_33)); 
>        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer()); 
>        IndexWriterConfig indexWriterConfig = new 
> IndexWriterConfig(Version.LUCENE_33, wrapper); 
>        Directory directory = FSDirectory.open(new File(indexPath)); 
>        IndexWriter indexWriter = new IndexWriter(directory, 
> indexWriterConfig); 
>  
> On search: 
>  
>        Directory directory = FSDirectory.open(new 
> File(lastIndexDir(Calendar.getInstance()))); 
>        IndexSearcher is = new IndexSearcher(directory); 
>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new 
> StandardAnalyzer(Version.LUCENE_33)); 
>        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer()); 
>        QueryParser parser = new QueryParser(Version.LUCENE_33, "name", 
> wrapper); 
>        Query query = parser.parse(expression); 
>        ScoreDoc[] hits = is.search(query, 1000).scoreDocs; 
>  
> Does anyone know any other phonetic analyzer implementation? I'm using 
> MetaphoneReplacementAnalyzer from LIA examples. 
>  
> I'm looking at lucene-contrib stuff at 
> http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I can't 
> seem to find other phonetic analyzers. 
>  
> Thanks! 
>  
>  
> On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <er...@gmail.com>wrote: 
>  
>>  
>> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote: 
>>>> Yes, quite possible, including boosting on exact matches if you want. 
>> Use 
>>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and 
>> once 
>>>> without, including fields at indexing time for both too of course. 
>>>>  
>>>  
>>> Would it be possible to point to an example where this is done. The best 
>>> example of a BooleanQuery I've found so far is this one: 
>>>  
>> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html 
>>>  
>>> But I couldn't find a boolean query using different analyzers for 
>> different 
>>> fields of the attribute. 
>>  
>> You could use two different QueryParser instances with different 
>> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still need to 
>> instances in order to have a different default field for each expression. 
>> But then use the techniques you saw in that article (or in Lucene in 
>> Action, since you mentioned having that) to combine Query objects into a 
>> BooleanQuery. 
>>  
>>       Erik 
>>  
>>  
>> --------------------------------------------------------------------- 
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
>> For additional commands, e-mail: java-user-help@lucene.apache.org 
>>  
>>  


--------------------------------------------------------------------- 
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
For additional commands, e-mail: java-user-help@lucene.apache.org 

Re: Phonetic search with Lucene 3.2

Posted by Paul Libbrecht <pa...@hoplahup.net>.
That uses Lucene 2.9.2 indeed.

paul


Le 9 nov. 2011 à 11:43, Felipe Carvalho a écrit :

> Which version of Lucene are you using? I had tried it with Lucene 3.3 and
> had some problems, did you have to do any customizations?
> 
> On Wed, Nov 9, 2011 at 8:38 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:
> 
>> We've been using
>>       http://www.tangentum.biz/en/products/phonetix/
>> which does double-metaphone.
>> Maybe that helps.
>> 
>> paul
>> 
>> 
>> Le 9 nov. 2011 à 11:29, Felipe Carvalho a écrit :
>> 
>>> Using PerFieldAnalyzerWrapper seems to be working for what I need!
>>> 
>>> On indexing:
>>> 
>>>       PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
>>> StandardAnalyzer(Version.LUCENE_33));
>>>       wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
>>>       IndexWriterConfig indexWriterConfig = new
>>> IndexWriterConfig(Version.LUCENE_33, wrapper);
>>>       Directory directory = FSDirectory.open(new File(indexPath));
>>>       IndexWriter indexWriter = new IndexWriter(directory,
>>> indexWriterConfig);
>>> 
>>> On search:
>>> 
>>>       Directory directory = FSDirectory.open(new
>>> File(lastIndexDir(Calendar.getInstance())));
>>>       IndexSearcher is = new IndexSearcher(directory);
>>>       PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
>>> StandardAnalyzer(Version.LUCENE_33));
>>>       wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
>>>       QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
>>> wrapper);
>>>       Query query = parser.parse(expression);
>>>       ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
>>> 
>>> Does anyone know any other phonetic analyzer implementation? I'm using
>>> MetaphoneReplacementAnalyzer from LIA examples.
>>> 
>>> I'm looking at lucene-contrib stuff at
>>> http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I
>> can't
>>> seem to find other phonetic analyzers.
>>> 
>>> Thanks!
>>> 
>>> 
>>> On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <erik.hatcher@gmail.com
>>> wrote:
>>> 
>>>> 
>>>> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
>>>>>> Yes, quite possible, including boosting on exact matches if you want.
>>>> Use
>>>>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
>>>> once
>>>>>> without, including fields at indexing time for both too of course.
>>>>>> 
>>>>> 
>>>>> Would it be possible to point to an example where this is done. The
>> best
>>>>> example of a BooleanQuery I've found so far is this one:
>>>>> 
>>>> 
>> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
>>>>> 
>>>>> But I couldn't find a boolean query using different analyzers for
>>>> different
>>>>> fields of the attribute.
>>>> 
>>>> You could use two different QueryParser instances with different
>>>> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still
>> need to
>>>> instances in order to have a different default field for each
>> expression.
>>>> But then use the techniques you saw in that article (or in Lucene in
>>>> Action, since you mentioned having that) to combine Query objects into a
>>>> BooleanQuery.
>>>> 
>>>>      Erik
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
Which version of Lucene are you using? I had tried it with Lucene 3.3 and
had some problems, did you have to do any customizations?

On Wed, Nov 9, 2011 at 8:38 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:

> We've been using
>        http://www.tangentum.biz/en/products/phonetix/
> which does double-metaphone.
> Maybe that helps.
>
> paul
>
>
> Le 9 nov. 2011 à 11:29, Felipe Carvalho a écrit :
>
> > Using PerFieldAnalyzerWrapper seems to be working for what I need!
> >
> > On indexing:
> >
> >        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> > StandardAnalyzer(Version.LUCENE_33));
> >        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
> >        IndexWriterConfig indexWriterConfig = new
> > IndexWriterConfig(Version.LUCENE_33, wrapper);
> >        Directory directory = FSDirectory.open(new File(indexPath));
> >        IndexWriter indexWriter = new IndexWriter(directory,
> > indexWriterConfig);
> >
> > On search:
> >
> >        Directory directory = FSDirectory.open(new
> > File(lastIndexDir(Calendar.getInstance())));
> >        IndexSearcher is = new IndexSearcher(directory);
> >        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> > StandardAnalyzer(Version.LUCENE_33));
> >        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
> >        QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
> > wrapper);
> >        Query query = parser.parse(expression);
> >        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
> >
> > Does anyone know any other phonetic analyzer implementation? I'm using
> > MetaphoneReplacementAnalyzer from LIA examples.
> >
> > I'm looking at lucene-contrib stuff at
> > http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I
> can't
> > seem to find other phonetic analyzers.
> >
> > Thanks!
> >
> >
> > On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <erik.hatcher@gmail.com
> >wrote:
> >
> >>
> >> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
> >>>> Yes, quite possible, including boosting on exact matches if you want.
> >> Use
> >>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
> >> once
> >>>> without, including fields at indexing time for both too of course.
> >>>>
> >>>
> >>> Would it be possible to point to an example where this is done. The
> best
> >>> example of a BooleanQuery I've found so far is this one:
> >>>
> >>
> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
> >>>
> >>> But I couldn't find a boolean query using different analyzers for
> >> different
> >>> fields of the attribute.
> >>
> >> You could use two different QueryParser instances with different
> >> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still
> need to
> >> instances in order to have a different default field for each
> expression.
> >> But then use the techniques you saw in that article (or in Lucene in
> >> Action, since you mentioned having that) to combine Query objects into a
> >> BooleanQuery.
> >>
> >>       Erik
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Phonetic search with Lucene 3.2

Posted by Paul Libbrecht <pa...@hoplahup.net>.
We've been using
	http://www.tangentum.biz/en/products/phonetix/
which does double-metaphone.
Maybe that helps.

paul


Le 9 nov. 2011 à 11:29, Felipe Carvalho a écrit :

> Using PerFieldAnalyzerWrapper seems to be working for what I need!
> 
> On indexing:
> 
>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer(Version.LUCENE_33));
>        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
>        IndexWriterConfig indexWriterConfig = new
> IndexWriterConfig(Version.LUCENE_33, wrapper);
>        Directory directory = FSDirectory.open(new File(indexPath));
>        IndexWriter indexWriter = new IndexWriter(directory,
> indexWriterConfig);
> 
> On search:
> 
>        Directory directory = FSDirectory.open(new
> File(lastIndexDir(Calendar.getInstance())));
>        IndexSearcher is = new IndexSearcher(directory);
>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer(Version.LUCENE_33));
>        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
>        QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
> wrapper);
>        Query query = parser.parse(expression);
>        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
> 
> Does anyone know any other phonetic analyzer implementation? I'm using
> MetaphoneReplacementAnalyzer from LIA examples.
> 
> I'm looking at lucene-contrib stuff at
> http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I can't
> seem to find other phonetic analyzers.
> 
> Thanks!
> 
> 
> On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <er...@gmail.com>wrote:
> 
>> 
>> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
>>>> Yes, quite possible, including boosting on exact matches if you want.
>> Use
>>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
>> once
>>>> without, including fields at indexing time for both too of course.
>>>> 
>>> 
>>> Would it be possible to point to an example where this is done. The best
>>> example of a BooleanQuery I've found so far is this one:
>>> 
>> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
>>> 
>>> But I couldn't find a boolean query using different analyzers for
>> different
>>> fields of the attribute.
>> 
>> You could use two different QueryParser instances with different
>> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still need to
>> instances in order to have a different default field for each expression.
>> But then use the techniques you saw in that article (or in Lucene in
>> Action, since you mentioned having that) to combine Query objects into a
>> BooleanQuery.
>> 
>>       Erik
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Phonetic search with Lucene 3.2

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

You can still add the solr.jar file to your classplath and simply use that
filter. Lot's of people are doing this for e.g. WordDelimiterFilter.

But yes, with Lucene trunk this was factored out of solr core and moved to a
new analyzer module.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Felipe Carvalho [mailto:felipe.carvalho@gmail.com]
> Sent: Wednesday, November 09, 2011 2:12 PM
> To: java-user@lucene.apache.org
> Subject: Re: Phonetic search with Lucene 3.2
> 
> Can I use Solr as a lib, like Lucene? My company is not willing to install
a Solr
> server... =/
> 
> On Wed, Nov 9, 2011 at 9:35 AM, Erik Hatcher <er...@gmail.com>
> wrote:
> 
> > Solr has, for a long while, included a PhoneticFilter that can
> > leverage several different algorithms.  This was pulled down to
> > Lucene, but only for trunk/4.0.
> >
> > Maybe use Solr instead?!  ;)
> >
> >        Erik
> >
> > On Nov 9, 2011, at 02:29 , Felipe Carvalho wrote:
> >
> > > Using PerFieldAnalyzerWrapper seems to be working for what I need!
> > >
> > > On indexing:
> > >
> > >        PerFieldAnalyzerWrapper wrapper = new
> > > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_33));
> > >        wrapper.addAnalyzer("nome", new
> MetaphoneReplacementAnalyzer());
> > >        IndexWriterConfig indexWriterConfig = new
> > > IndexWriterConfig(Version.LUCENE_33, wrapper);
> > >        Directory directory = FSDirectory.open(new File(indexPath));
> > >        IndexWriter indexWriter = new IndexWriter(directory,
> > > indexWriterConfig);
> > >
> > > On search:
> > >
> > >        Directory directory = FSDirectory.open(new
> > > File(lastIndexDir(Calendar.getInstance())));
> > >        IndexSearcher is = new IndexSearcher(directory);
> > >        PerFieldAnalyzerWrapper wrapper = new
> > > PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_33));
> > >        wrapper.addAnalyzer("name", new
> MetaphoneReplacementAnalyzer());
> > >        QueryParser parser = new QueryParser(Version.LUCENE_33,
> > > "name", wrapper);
> > >        Query query = parser.parse(expression);
> > >        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
> > >
> > > Does anyone know any other phonetic analyzer implementation? I'm
> > > using MetaphoneReplacementAnalyzer from LIA examples.
> > >
> > > I'm looking at lucene-contrib stuff at
> > > http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I
> > can't
> > > seem to find other phonetic analyzers.
> > >
> > > Thanks!
> > >
> > >
> > > On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher
> > ><erik.hatcher@gmail.com
> > >wrote:
> > >
> > >>
> > >> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
> > >>>> Yes, quite possible, including boosting on exact matches if you
want.
> > >> Use
> > >>>> a BooleanQuery to wrap clauses parsed once with phonetic
> > >>>> analysis, and
> > >> once
> > >>>> without, including fields at indexing time for both too of course.
> > >>>>
> > >>>
> > >>> Would it be possible to point to an example where this is done.
> > >>> The
> > best
> > >>> example of a BooleanQuery I've found so far is this one:
> > >>>
> > >>
> > http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with
> > -a-boolean-query.html
> > >>>
> > >>> But I couldn't find a boolean query using different analyzers for
> > >> different
> > >>> fields of the attribute.
> > >>
> > >> You could use two different QueryParser instances with different
> > >> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still
> > need to
> > >> instances in order to have a different default field for each
> > expression.
> > >> But then use the techniques you saw in that article (or in Lucene
> > >> in Action, since you mentioned having that) to combine Query
> > >> objects into a BooleanQuery.
> > >>
> > >>       Erik
> > >>
> > >>
> > >> -------------------------------------------------------------------
> > >> -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Peter Karich <pe...@yahoo.de>.
 well using embedded solr would be an option althought not recommended.

or look into elasticsearch:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/phonetic-tokenfilter.html
http://www.elasticsearch.org/guide/reference/java-api/client.html

Regards,
Peter.

> Can I use Solr as a lib, like Lucene? My company is not willing to install
> a Solr server... =/
>
> On Wed, Nov 9, 2011 at 9:35 AM, Erik Hatcher <er...@gmail.com> wrote:
>
>> Solr has, for a long while, included a PhoneticFilter that can leverage
>> several different algorithms.  This was pulled down to Lucene, but only for
>> trunk/4.0.
>>
>> Maybe use Solr instead?!  ;)
>>
>>        Erik
>>
>> On Nov 9, 2011, at 02:29 , Felipe Carvalho wrote:
>>
>>> Using PerFieldAnalyzerWrapper seems to be working for what I need!
>>>
>>> On indexing:
>>>
>>>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
>>> StandardAnalyzer(Version.LUCENE_33));
>>>        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
>>>        IndexWriterConfig indexWriterConfig = new
>>> IndexWriterConfig(Version.LUCENE_33, wrapper);
>>>        Directory directory = FSDirectory.open(new File(indexPath));
>>>        IndexWriter indexWriter = new IndexWriter(directory,
>>> indexWriterConfig);
>>>
>>> On search:
>>>
>>>        Directory directory = FSDirectory.open(new
>>> File(lastIndexDir(Calendar.getInstance())));
>>>        IndexSearcher is = new IndexSearcher(directory);
>>>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
>>> StandardAnalyzer(Version.LUCENE_33));
>>>        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
>>>        QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
>>> wrapper);
>>>        Query query = parser.parse(expression);
>>>        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
>>>
>>> Does anyone know any other phonetic analyzer implementation? I'm using
>>> MetaphoneReplacementAnalyzer from LIA examples.
>>>
>>> I'm looking at lucene-contrib stuff at
>>> http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I
>> can't
>>> seem to find other phonetic analyzers.
>>>
>>> Thanks!
>>>
>>>
>>> On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <erik.hatcher@gmail.com
>>> wrote:
>>>
>>>> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
>>>>>> Yes, quite possible, including boosting on exact matches if you want.
>>>> Use
>>>>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
>>>> once
>>>>>> without, including fields at indexing time for both too of course.
>>>>>>
>>>>> Would it be possible to point to an example where this is done. The
>> best
>>>>> example of a BooleanQuery I've found so far is this one:
>>>>>
>> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
>>>>> But I couldn't find a boolean query using different analyzers for
>>>> different
>>>>> fields of the attribute.
>>>> You could use two different QueryParser instances with different
>>>> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still
>> need to
>>>> instances in order to have a different default field for each
>> expression.
>>>> But then use the techniques you saw in that article (or in Lucene in
>>>> Action, since you mentioned having that) to combine Query objects into a
>>>> BooleanQuery.
>>>>
>>>>       Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Erik Hatcher <er...@gmail.com>.
On Nov 9, 2011, at 05:11 , Felipe Carvalho wrote:

> Can I use Solr as a lib, like Lucene? My company is not willing to install
> a Solr server... =/

That's too bad.  What's the rationale for that decision?  A large number of big big companies are deploying on Solr quite happily.  I just taught a Solr class here at ApacheCon with attendees from several recognized company names.  One attendee has already scaled to billions of documents!

But yes, you can run Solr as a library using EmbeddedSolrServer.  It operates much like Lucene in that regard in terms of having an API to index documents and search, with configuration being done with Solr's schema, etc mechanisms.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
Can I use Solr as a lib, like Lucene? My company is not willing to install
a Solr server... =/

On Wed, Nov 9, 2011 at 9:35 AM, Erik Hatcher <er...@gmail.com> wrote:

> Solr has, for a long while, included a PhoneticFilter that can leverage
> several different algorithms.  This was pulled down to Lucene, but only for
> trunk/4.0.
>
> Maybe use Solr instead?!  ;)
>
>        Erik
>
> On Nov 9, 2011, at 02:29 , Felipe Carvalho wrote:
>
> > Using PerFieldAnalyzerWrapper seems to be working for what I need!
> >
> > On indexing:
> >
> >        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> > StandardAnalyzer(Version.LUCENE_33));
> >        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
> >        IndexWriterConfig indexWriterConfig = new
> > IndexWriterConfig(Version.LUCENE_33, wrapper);
> >        Directory directory = FSDirectory.open(new File(indexPath));
> >        IndexWriter indexWriter = new IndexWriter(directory,
> > indexWriterConfig);
> >
> > On search:
> >
> >        Directory directory = FSDirectory.open(new
> > File(lastIndexDir(Calendar.getInstance())));
> >        IndexSearcher is = new IndexSearcher(directory);
> >        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> > StandardAnalyzer(Version.LUCENE_33));
> >        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
> >        QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
> > wrapper);
> >        Query query = parser.parse(expression);
> >        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
> >
> > Does anyone know any other phonetic analyzer implementation? I'm using
> > MetaphoneReplacementAnalyzer from LIA examples.
> >
> > I'm looking at lucene-contrib stuff at
> > http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I
> can't
> > seem to find other phonetic analyzers.
> >
> > Thanks!
> >
> >
> > On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <erik.hatcher@gmail.com
> >wrote:
> >
> >>
> >> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
> >>>> Yes, quite possible, including boosting on exact matches if you want.
> >> Use
> >>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
> >> once
> >>>> without, including fields at indexing time for both too of course.
> >>>>
> >>>
> >>> Would it be possible to point to an example where this is done. The
> best
> >>> example of a BooleanQuery I've found so far is this one:
> >>>
> >>
> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
> >>>
> >>> But I couldn't find a boolean query using different analyzers for
> >> different
> >>> fields of the attribute.
> >>
> >> You could use two different QueryParser instances with different
> >> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still
> need to
> >> instances in order to have a different default field for each
> expression.
> >> But then use the techniques you saw in that article (or in Lucene in
> >> Action, since you mentioned having that) to combine Query objects into a
> >> BooleanQuery.
> >>
> >>       Erik
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Phonetic search with Lucene 3.2

Posted by Erik Hatcher <er...@gmail.com>.
Solr has, for a long while, included a PhoneticFilter that can leverage several different algorithms.  This was pulled down to Lucene, but only for trunk/4.0. 

Maybe use Solr instead?!  ;) 

	Erik

On Nov 9, 2011, at 02:29 , Felipe Carvalho wrote:

> Using PerFieldAnalyzerWrapper seems to be working for what I need!
> 
> On indexing:
> 
>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer(Version.LUCENE_33));
>        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
>        IndexWriterConfig indexWriterConfig = new
> IndexWriterConfig(Version.LUCENE_33, wrapper);
>        Directory directory = FSDirectory.open(new File(indexPath));
>        IndexWriter indexWriter = new IndexWriter(directory,
> indexWriterConfig);
> 
> On search:
> 
>        Directory directory = FSDirectory.open(new
> File(lastIndexDir(Calendar.getInstance())));
>        IndexSearcher is = new IndexSearcher(directory);
>        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer(Version.LUCENE_33));
>        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
>        QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
> wrapper);
>        Query query = parser.parse(expression);
>        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;
> 
> Does anyone know any other phonetic analyzer implementation? I'm using
> MetaphoneReplacementAnalyzer from LIA examples.
> 
> I'm looking at lucene-contrib stuff at
> http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I can't
> seem to find other phonetic analyzers.
> 
> Thanks!
> 
> 
> On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <er...@gmail.com>wrote:
> 
>> 
>> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
>>>> Yes, quite possible, including boosting on exact matches if you want.
>> Use
>>>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
>> once
>>>> without, including fields at indexing time for both too of course.
>>>> 
>>> 
>>> Would it be possible to point to an example where this is done. The best
>>> example of a BooleanQuery I've found so far is this one:
>>> 
>> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
>>> 
>>> But I couldn't find a boolean query using different analyzers for
>> different
>>> fields of the attribute.
>> 
>> You could use two different QueryParser instances with different
>> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still need to
>> instances in order to have a different default field for each expression.
>> But then use the techniques you saw in that article (or in Lucene in
>> Action, since you mentioned having that) to combine Query objects into a
>> BooleanQuery.
>> 
>>       Erik
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
Using PerFieldAnalyzerWrapper seems to be working for what I need!

On indexing:

        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
StandardAnalyzer(Version.LUCENE_33));
        wrapper.addAnalyzer("nome", new MetaphoneReplacementAnalyzer());
        IndexWriterConfig indexWriterConfig = new
IndexWriterConfig(Version.LUCENE_33, wrapper);
        Directory directory = FSDirectory.open(new File(indexPath));
        IndexWriter indexWriter = new IndexWriter(directory,
indexWriterConfig);

On search:

        Directory directory = FSDirectory.open(new
File(lastIndexDir(Calendar.getInstance())));
        IndexSearcher is = new IndexSearcher(directory);
        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(new
StandardAnalyzer(Version.LUCENE_33));
        wrapper.addAnalyzer("name", new MetaphoneReplacementAnalyzer());
        QueryParser parser = new QueryParser(Version.LUCENE_33, "name",
wrapper);
        Query query = parser.parse(expression);
        ScoreDoc[] hits = is.search(query, 1000).scoreDocs;

Does anyone know any other phonetic analyzer implementation? I'm using
MetaphoneReplacementAnalyzer from LIA examples.

I'm looking at lucene-contrib stuff at
http://lucene.apache.org/java/3_4_0/lucene-contrib/index.html but I can't
seem to find other phonetic analyzers.

Thanks!


On Tue, Nov 8, 2011 at 12:19 PM, Erik Hatcher <er...@gmail.com>wrote:

>
> On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
> >> Yes, quite possible, including boosting on exact matches if you want.
>  Use
> >> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
> once
> >> without, including fields at indexing time for both too of course.
> >>
> >
> > Would it be possible to point to an example where this is done. The best
> > example of a BooleanQuery I've found so far is this one:
> >
> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
> >
> > But I couldn't find a boolean query using different analyzers for
> different
> > fields of the attribute.
>
> You could use two different QueryParser instances with different
> analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still need to
> instances in order to have a different default field for each expression.
>  But then use the techniques you saw in that article (or in Lucene in
> Action, since you mentioned having that) to combine Query objects into a
> BooleanQuery.
>
>        Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Phonetic search with Lucene 3.2

Posted by Erik Hatcher <er...@gmail.com>.
On Nov 8, 2011, at 05:42 , Felipe Carvalho wrote:
>> Yes, quite possible, including boosting on exact matches if you want.  Use
>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
>> without, including fields at indexing time for both too of course.
>> 
> 
> Would it be possible to point to an example where this is done. The best
> example of a BooleanQuery I've found so far is this one:
> http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
> 
> But I couldn't find a boolean query using different analyzers for different
> fields of the attribute.

You could use two different QueryParser instances with different analyzers.  Or use the PerFieldAnalyzerWrapper, though you'll still need to instances in order to have a different default field for each expression.  But then use the techniques you saw in that article (or in Lucene in Action, since you mentioned having that) to combine Query objects into a BooleanQuery.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
On Tue, Nov 8, 2011 at 10:06 AM, Erik Hatcher <er...@gmail.com>wrote:

>
> On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:
>
> > One other question: I'm looking at Lucene 3.4 javadocs (
> > http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't
> find
> > MetaphoneReplacementAnalyzer anywhere. Does any one know if this class
> has
> > been removed from lucene-core.
>
> That class is in Lucene in Action's companion code, not Lucene itself.
>  Download it from http://www.manning.com/lucene
>
> > My Lucene In Action edition is from 2004, so I'm guessing things kinda
> > changed since then.
>
> There's a second edition out now, well worth getting if I do say so myself
> :)  (I've learned a lot from reading and re-reading it myself, to be honest
> - thanks MikeM!)
>
> >> Now suppose my document had a particular field I don't want to be
> >> metaphones one the search, for instance, "exactName". For example,
> suppose
> >> I want to look for all documents which contents phonetically match "kool
> >> kat" and exactName match "kat" but not "cat", generating an expression
> like
> >> this: "exactName:kat AND contents:kool kat".
> >>
> >> Is it possible to do this? If so, how would I do it? Can I use specific
> >> analyzers for each field?
>
> Yes, quite possible, including boosting on exact matches if you want.  Use
> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
> without, including fields at indexing time for both too of course.
>

Would it be possible to point to an example where this is done. The best
example of a BooleanQuery I've found so far is this one:
http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html

But I couldn't find a boolean query using different analyzers for different
fields of the attribute.

Thanks a lot!


>
>        Erik
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Phonetic search with Lucene 3.2

Posted by Erik Hatcher <er...@gmail.com>.
Felipe -

Look at the other Lucene JARs available.  lucene-analyzers, I think is where it is:

   <http://search.maven.org/#search|ga|1|g%3A%22org.apache.lucene%22%20AND%20v%3A%223.4.0%22>

Personally, I'd download Lucene 3.4 release from Apache and use the JARs from there.

	Erik


On Nov 8, 2011, at 04:16 , Felipe Carvalho wrote:

> Thanks, Erik!
> 
> I'm looking at lucene-all javadocs, and there are some interesting classes
> (specifically I'd like to use
> org.apache.lucene.analysis.br.BrazilianAnalyzer). I'm able to find
> lucene-core on http://search.maven.org/, but is there a lucene-all
> published on some maven repo? or should I get those contrib classes out of
> some other dependency?
> 
> Thanks!
> 
> On Tue, Nov 8, 2011 at 10:06 AM, Erik Hatcher <er...@gmail.com>wrote:
> 
>> 
>> On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:
>> 
>>> One other question: I'm looking at Lucene 3.4 javadocs (
>>> http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't
>> find
>>> MetaphoneReplacementAnalyzer anywhere. Does any one know if this class
>> has
>>> been removed from lucene-core.
>> 
>> That class is in Lucene in Action's companion code, not Lucene itself.
>> Download it from http://www.manning.com/lucene
>> 
>>> My Lucene In Action edition is from 2004, so I'm guessing things kinda
>>> changed since then.
>> 
>> There's a second edition out now, well worth getting if I do say so myself
>> :)  (I've learned a lot from reading and re-reading it myself, to be honest
>> - thanks MikeM!)
>> 
>>>> Now suppose my document had a particular field I don't want to be
>>>> metaphones one the search, for instance, "exactName". For example,
>> suppose
>>>> I want to look for all documents which contents phonetically match "kool
>>>> kat" and exactName match "kat" but not "cat", generating an expression
>> like
>>>> this: "exactName:kat AND contents:kool kat".
>>>> 
>>>> Is it possible to do this? If so, how would I do it? Can I use specific
>>>> analyzers for each field?
>> 
>> Yes, quite possible, including boosting on exact matches if you want.  Use
>> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
>> without, including fields at indexing time for both too of course.
>> 
>>       Erik
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
Sorry, the last was a dumb question, just found
org.apache.lucene:lucene-analyzers:3.3.0 on maven central repo

Thanks a lot for the help!

On Tue, Nov 8, 2011 at 10:16 AM, Felipe Carvalho
<fe...@gmail.com>wrote:

> Thanks, Erik!
>
> I'm looking at lucene-all javadocs, and there are some interesting classes
> (specifically I'd like to use
> org.apache.lucene.analysis.br.BrazilianAnalyzer). I'm able to find
> lucene-core on http://search.maven.org/, but is there a lucene-all
> published on some maven repo? or should I get those contrib classes out of
> some other dependency?
>
> Thanks!
>
>
> On Tue, Nov 8, 2011 at 10:06 AM, Erik Hatcher <er...@gmail.com>wrote:
>
>>
>> On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:
>>
>> > One other question: I'm looking at Lucene 3.4 javadocs (
>> > http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't
>> find
>> > MetaphoneReplacementAnalyzer anywhere. Does any one know if this class
>> has
>> > been removed from lucene-core.
>>
>> That class is in Lucene in Action's companion code, not Lucene itself.
>>  Download it from http://www.manning.com/lucene
>>
>> > My Lucene In Action edition is from 2004, so I'm guessing things kinda
>> > changed since then.
>>
>> There's a second edition out now, well worth getting if I do say so
>> myself :)  (I've learned a lot from reading and re-reading it myself, to be
>> honest - thanks MikeM!)
>>
>> >> Now suppose my document had a particular field I don't want to be
>> >> metaphones one the search, for instance, "exactName". For example,
>> suppose
>> >> I want to look for all documents which contents phonetically match
>> "kool
>> >> kat" and exactName match "kat" but not "cat", generating an expression
>> like
>> >> this: "exactName:kat AND contents:kool kat".
>> >>
>> >> Is it possible to do this? If so, how would I do it? Can I use specific
>> >> analyzers for each field?
>>
>> Yes, quite possible, including boosting on exact matches if you want.
>>  Use a BooleanQuery to wrap clauses parsed once with phonetic analysis, and
>> once without, including fields at indexing time for both too of course.
>>
>>        Erik
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
Thanks, Erik!

I'm looking at lucene-all javadocs, and there are some interesting classes
(specifically I'd like to use
org.apache.lucene.analysis.br.BrazilianAnalyzer). I'm able to find
lucene-core on http://search.maven.org/, but is there a lucene-all
published on some maven repo? or should I get those contrib classes out of
some other dependency?

Thanks!

On Tue, Nov 8, 2011 at 10:06 AM, Erik Hatcher <er...@gmail.com>wrote:

>
> On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:
>
> > One other question: I'm looking at Lucene 3.4 javadocs (
> > http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't
> find
> > MetaphoneReplacementAnalyzer anywhere. Does any one know if this class
> has
> > been removed from lucene-core.
>
> That class is in Lucene in Action's companion code, not Lucene itself.
>  Download it from http://www.manning.com/lucene
>
> > My Lucene In Action edition is from 2004, so I'm guessing things kinda
> > changed since then.
>
> There's a second edition out now, well worth getting if I do say so myself
> :)  (I've learned a lot from reading and re-reading it myself, to be honest
> - thanks MikeM!)
>
> >> Now suppose my document had a particular field I don't want to be
> >> metaphones one the search, for instance, "exactName". For example,
> suppose
> >> I want to look for all documents which contents phonetically match "kool
> >> kat" and exactName match "kat" but not "cat", generating an expression
> like
> >> this: "exactName:kat AND contents:kool kat".
> >>
> >> Is it possible to do this? If so, how would I do it? Can I use specific
> >> analyzers for each field?
>
> Yes, quite possible, including boosting on exact matches if you want.  Use
> a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once
> without, including fields at indexing time for both too of course.
>
>        Erik
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Phonetic search with Lucene 3.2

Posted by Erik Hatcher <er...@gmail.com>.
On Nov 8, 2011, at 03:58 , Felipe Carvalho wrote:

> One other question: I'm looking at Lucene 3.4 javadocs (
> http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't find
> MetaphoneReplacementAnalyzer anywhere. Does any one know if this class has
> been removed from lucene-core.

That class is in Lucene in Action's companion code, not Lucene itself.  Download it from http://www.manning.com/lucene

> My Lucene In Action edition is from 2004, so I'm guessing things kinda
> changed since then.

There's a second edition out now, well worth getting if I do say so myself :)  (I've learned a lot from reading and re-reading it myself, to be honest - thanks MikeM!)

>> Now suppose my document had a particular field I don't want to be
>> metaphones one the search, for instance, "exactName". For example, suppose
>> I want to look for all documents which contents phonetically match "kool
>> kat" and exactName match "kat" but not "cat", generating an expression like
>> this: "exactName:kat AND contents:kool kat".
>> 
>> Is it possible to do this? If so, how would I do it? Can I use specific
>> analyzers for each field?

Yes, quite possible, including boosting on exact matches if you want.  Use a BooleanQuery to wrap clauses parsed once with phonetic analysis, and once without, including fields at indexing time for both too of course.

	Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
One other question: I'm looking at Lucene 3.4 javadocs (
http://lucene.apache.org/java/3_4_0/api/core/index.html) but I can't find
MetaphoneReplacementAnalyzer anywhere. Does any one know if this class has
been removed from lucene-core.

My Lucene In Action edition is from 2004, so I'm guessing things kinda
changed since then.

Thanks!

On Tue, Nov 8, 2011 at 9:50 AM, Felipe Carvalho
<fe...@gmail.com>wrote:

> Thanks for the reply, Paul!
>
> I got this example from Lucene In Action:
>
> public void testKoolKat(){
>     RAMDirectory directory = new RAMDirectory();
>     Analyzer analyzer = new MetaphoneReplacementAnalyzer();
>
>     IndexWriter writer = new IndexWriter(directory, analyzer, true);
>     Document doc = new Document();
>     doc.add(Field.Text("contents", "cool cat"));
>     writer.addDocument(doc);
>     writer.close();
>
>     IndexSearcher searcher = new IndexSearcher(directory);
>     Query query = QueryParser.parse("kool kat", "contents", analyzer);
>
>     Hits hits = searcher.search(query);
>
>     assertEquals(1, hits.length);
>     assertEquals("cool cat", hits.doc(0).get("contents"));
>
>     searcher.close();
> }
>
> Now suppose my document had a particular field I don't want to be
> metaphones one the search, for instance, "exactName". For example, suppose
> I want to look for all documents which contents phonetically match "kool
> kat" and exactName match "kat" but not "cat", generating an expression like
> this: "exactName:kat AND contents:kool kat".
>
> Is it possible to do this? If so, how would I do it? Can I use specific
> analyzers for each field?
>
> Thanks,
>   Felipe
>
>
> On Tue, Nov 8, 2011 at 5:06 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:
>
>> Felipe,
>>
>> in Lucene in Action there's a little bit on that.
>> Basically it's just about using the right analyzer.
>>
>> paul
>>
>>
>> Le 8 nov. 2011 à 01:45, Felipe Carvalho a écrit :
>>
>> > Hello,
>> >  I'm using Lucene 3.2 on a phone book app and phonetic search is a
>> > requirement. I've googled up "lucene phonetic search" but could not find
>> > many references. I did find this article, but I'm not sure about how
>> > updated it is: http://tech.javayogi.com/hello-world-lucene.html
>> >  I couldn't find anything browsing on Lucene's docs or mail archives
>> > either.
>> >  I did find this improvement on Jira (
>> > https://issues.apache.org/jira/browse/LUCENE-2413) but as far as I
>> could
>> > understand, it seems like phonetic capability is scheduled to be added
>> to
>> > lucene-core on 4.0 version only.
>> >  Can anyone point to an example of phonetic indexing and searching?
>> Should
>> > I use Phonetix (
>> >
>> http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html
>> )
>> > stuff?
>> >
>> > Thanks a lot,
>> >  Felipe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Phonetic search with Lucene 3.2

Posted by Felipe Carvalho <fe...@gmail.com>.
Thanks for the reply, Paul!

I got this example from Lucene In Action:

public void testKoolKat(){
    RAMDirectory directory = new RAMDirectory();
    Analyzer analyzer = new MetaphoneReplacementAnalyzer();

    IndexWriter writer = new IndexWriter(directory, analyzer, true);
    Document doc = new Document();
    doc.add(Field.Text("contents", "cool cat"));
    writer.addDocument(doc);
    writer.close();

    IndexSearcher searcher = new IndexSearcher(directory);
    Query query = QueryParser.parse("kool kat", "contents", analyzer);

    Hits hits = searcher.search(query);

    assertEquals(1, hits.length);
    assertEquals("cool cat", hits.doc(0).get("contents"));

    searcher.close();
}

Now suppose my document had a particular field I don't want to be
metaphones one the search, for instance, "exactName". For example, suppose
I want to look for all documents which contents phonetically match "kool
kat" and exactName match "kat" but not "cat", generating an expression like
this: "exactName:kat AND contents:kool kat".

Is it possible to do this? If so, how would I do it? Can I use specific
analyzers for each field?

Thanks,
  Felipe

On Tue, Nov 8, 2011 at 5:06 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:

> Felipe,
>
> in Lucene in Action there's a little bit on that.
> Basically it's just about using the right analyzer.
>
> paul
>
>
> Le 8 nov. 2011 à 01:45, Felipe Carvalho a écrit :
>
> > Hello,
> >  I'm using Lucene 3.2 on a phone book app and phonetic search is a
> > requirement. I've googled up "lucene phonetic search" but could not find
> > many references. I did find this article, but I'm not sure about how
> > updated it is: http://tech.javayogi.com/hello-world-lucene.html
> >  I couldn't find anything browsing on Lucene's docs or mail archives
> > either.
> >  I did find this improvement on Jira (
> > https://issues.apache.org/jira/browse/LUCENE-2413) but as far as I could
> > understand, it seems like phonetic capability is scheduled to be added to
> > lucene-core on 4.0 version only.
> >  Can anyone point to an example of phonetic indexing and searching?
> Should
> > I use Phonetix (
> >
> http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html
> )
> > stuff?
> >
> > Thanks a lot,
> >  Felipe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Phonetic search with Lucene 3.2

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Felipe,

in Lucene in Action there's a little bit on that.
Basically it's just about using the right analyzer.

paul


Le 8 nov. 2011 à 01:45, Felipe Carvalho a écrit :

> Hello,
>  I'm using Lucene 3.2 on a phone book app and phonetic search is a
> requirement. I've googled up "lucene phonetic search" but could not find
> many references. I did find this article, but I'm not sure about how
> updated it is: http://tech.javayogi.com/hello-world-lucene.html
>  I couldn't find anything browsing on Lucene's docs or mail archives
> either.
>  I did find this improvement on Jira (
> https://issues.apache.org/jira/browse/LUCENE-2413) but as far as I could
> understand, it seems like phonetic capability is scheduled to be added to
> lucene-core on 4.0 version only.
>  Can anyone point to an example of phonetic indexing and searching? Should
> I use Phonetix (
> http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html)
> stuff?
> 
> Thanks a lot,
>  Felipe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org