You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tom Conlon <to...@2ls.com> on 2007/10/01 00:47:47 UTC
RE: Escaping special characters
Hi,
In case this is of help to others:
Crux of problem:
I wanted numbers and characters such as # and + to be considered.
Solution:
implement a LowercaseWhitespaceAnalyzer and a
LowercaseWhitespaceTokenizer.
Tom
=======================================================================
Diagnostics:
StandardAnalyzer
----------------
Enter Querystring: (C++ AND C#) Searching for: +c +c
Enter Querystring: (C\+\+ AND C\#) Searching for: +c +c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" "sharepoint 2007") asp.net
SimpleAnalyser
--------------
Enter Querystring: C++ Searching for: c
Enter Querystring: C# Searching for: c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss or sharepoint) and "asp net"
WhitespaceAnalyzer
------------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C#
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" or "sharepoint 2007") and asp.net
KeywordAnalyzer
---------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C#
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss 2007 or sharepoint 2007) and asp.net
StopAnalyzer
------------
Enter Querystring: (C\++ AND C\#) Searching for: +c +c
Enter Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
Searching for: (moss sharepoint) "asp net"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Poor performance - 2/3 ORs
Posted by Grant Ingersoll <gs...@apache.org>.
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
On Sep 30, 2007, at 6:55 PM, Tom Conlon wrote:
> Hi,
>
> Don't get me wrong - I think lucene is great.
>
> However, the first site I am using it with has 15k docs and the
> performance for ORs seem longer than I'd expect.
>
> Any tips to improve this?
>
> Thanks,
> Tom
>
> -----Original Message-----
> From: Tom Conlon [mailto:tomc@2ls.com]
> Sent: 30 September 2007 23:48
> To: java-user@lucene.apache.org
> Subject: RE: Escaping special characters
>
> Hi,
>
> In case this is of help to others:
>
> Crux of problem:
> I wanted numbers and characters such as # and + to be considered.
>
> Solution:
> implement a LowercaseWhitespaceAnalyzer and a
> LowercaseWhitespaceTokenizer.
>
> Tom
> ======================================================================
> =
> Diagnostics:
>
> StandardAnalyzer
> ----------------
> Enter Querystring: (C++ AND C#) Searching for: +c +c
> Enter Querystring: (C\+\+ AND C\#) Searching for: +c +c
> Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: ("moss 2007" "sharepoint 2007") asp.net
>
> SimpleAnalyser
> --------------
> Enter Querystring: C++ Searching for: c
> Enter Querystring: C# Searching for: c
> Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: (moss or sharepoint) and "asp net"
>
> WhitespaceAnalyzer
> ------------------
> Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
> Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: ("moss 2007" or "sharepoint 2007") and asp.net
>
> KeywordAnalyzer
> ---------------
> Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
> Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: (moss 2007 or sharepoint 2007) and asp.net
>
> StopAnalyzer
> ------------
> Enter Querystring: (C\++ AND C\#) Searching for: +c +c Enter
> Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
> Searching for: (moss sharepoint) "asp net"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Poor performance - 2/3 ORs
Posted by Tom Conlon <to...@2ls.com>.
Hi,
Don't get me wrong - I think lucene is great.
However, the first site I am using it with has 15k docs and the
performance for ORs seem longer than I'd expect.
Any tips to improve this?
Thanks,
Tom
-----Original Message-----
From: Tom Conlon [mailto:tomc@2ls.com]
Sent: 30 September 2007 23:48
To: java-user@lucene.apache.org
Subject: RE: Escaping special characters
Hi,
In case this is of help to others:
Crux of problem:
I wanted numbers and characters such as # and + to be considered.
Solution:
implement a LowercaseWhitespaceAnalyzer and a
LowercaseWhitespaceTokenizer.
Tom
=======================================================================
Diagnostics:
StandardAnalyzer
----------------
Enter Querystring: (C++ AND C#) Searching for: +c +c
Enter Querystring: (C\+\+ AND C\#) Searching for: +c +c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" "sharepoint 2007") asp.net
SimpleAnalyser
--------------
Enter Querystring: C++ Searching for: c
Enter Querystring: C# Searching for: c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss or sharepoint) and "asp net"
WhitespaceAnalyzer
------------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" or "sharepoint 2007") and asp.net
KeywordAnalyzer
---------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss 2007 or sharepoint 2007) and asp.net
StopAnalyzer
------------
Enter Querystring: (C\++ AND C\#) Searching for: +c +c Enter
Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
Searching for: (moss sharepoint) "asp net"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org