You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tom Conlon <to...@2ls.com> on 2007/10/01 00:47:47 UTC

RE: Escaping special characters

Hi,

In case this is of help to others:

Crux of problem: 
I wanted numbers and characters such as # and + to be considered.

Solution:
implement a LowercaseWhitespaceAnalyzer and a
LowercaseWhitespaceTokenizer.

Tom
=======================================================================
Diagnostics:

StandardAnalyzer 
----------------
Enter Querystring: (C++ AND C#)      Searching for: +c +c
Enter Querystring: (C\+\+ AND C\#)   Searching for: +c +c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" "sharepoint 2007") asp.net

SimpleAnalyser
--------------
Enter Querystring: C++ Searching for: c
Enter Querystring: C#  Searching for: c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss or sharepoint) and "asp net"

WhitespaceAnalyzer
------------------
Enter Querystring: (C++ AND C#)  Searching for: +C++ +C#
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" or "sharepoint 2007") and asp.net

KeywordAnalyzer
---------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C#
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss 2007 or sharepoint 2007) and asp.net

StopAnalyzer
------------
Enter Querystring: (C\++ AND C\#)  Searching for: +c +c
Enter Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
Searching for: (moss sharepoint) "asp net"
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Poor performance - 2/3 ORs

Posted by Grant Ingersoll <gs...@apache.org>.
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

On Sep 30, 2007, at 6:55 PM, Tom Conlon wrote:

> Hi,
>
> Don't get me wrong - I think lucene is great.
>
> However, the first site I am using it with has 15k docs and the
> performance for ORs seem longer than I'd expect.
>
> Any tips to improve this?
>
> Thanks,
> Tom
>
> -----Original Message-----
> From: Tom Conlon [mailto:tomc@2ls.com]
> Sent: 30 September 2007 23:48
> To: java-user@lucene.apache.org
> Subject: RE: Escaping special characters
>
> Hi,
>
> In case this is of help to others:
>
> Crux of problem:
> I wanted numbers and characters such as # and + to be considered.
>
> Solution:
> implement a LowercaseWhitespaceAnalyzer and a
> LowercaseWhitespaceTokenizer.
>
> Tom
> ====================================================================== 
> =
> Diagnostics:
>
> StandardAnalyzer
> ----------------
> Enter Querystring: (C++ AND C#)      Searching for: +c +c
> Enter Querystring: (C\+\+ AND C\#)   Searching for: +c +c
> Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: ("moss 2007" "sharepoint 2007") asp.net
>
> SimpleAnalyser
> --------------
> Enter Querystring: C++ Searching for: c
> Enter Querystring: C#  Searching for: c
> Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: (moss or sharepoint) and "asp net"
>
> WhitespaceAnalyzer
> ------------------
> Enter Querystring: (C++ AND C#)  Searching for: +C++ +C# Enter
> Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: ("moss 2007" or "sharepoint 2007") and asp.net
>
> KeywordAnalyzer
> ---------------
> Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
> Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
> Searching for: (moss 2007 or sharepoint 2007) and asp.net
>
> StopAnalyzer
> ------------
> Enter Querystring: (C\++ AND C\#)  Searching for: +c +c Enter
> Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
> Searching for: (moss sharepoint) "asp net"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Poor performance - 2/3 ORs

Posted by Tom Conlon <to...@2ls.com>.
Hi,

Don't get me wrong - I think lucene is great.

However, the first site I am using it with has 15k docs and the
performance for ORs seem longer than I'd expect. 

Any tips to improve this? 

Thanks,
Tom 

-----Original Message-----
From: Tom Conlon [mailto:tomc@2ls.com] 
Sent: 30 September 2007 23:48
To: java-user@lucene.apache.org
Subject: RE: Escaping special characters

Hi,

In case this is of help to others:

Crux of problem: 
I wanted numbers and characters such as # and + to be considered.

Solution:
implement a LowercaseWhitespaceAnalyzer and a
LowercaseWhitespaceTokenizer.

Tom
=======================================================================
Diagnostics:

StandardAnalyzer
----------------
Enter Querystring: (C++ AND C#)      Searching for: +c +c
Enter Querystring: (C\+\+ AND C\#)   Searching for: +c +c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" "sharepoint 2007") asp.net

SimpleAnalyser
--------------
Enter Querystring: C++ Searching for: c
Enter Querystring: C#  Searching for: c
Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss or sharepoint) and "asp net"

WhitespaceAnalyzer
------------------
Enter Querystring: (C++ AND C#)  Searching for: +C++ +C# Enter
Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: ("moss 2007" or "sharepoint 2007") and asp.net

KeywordAnalyzer
---------------
Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter
Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net"
Searching for: (moss 2007 or sharepoint 2007) and asp.net

StopAnalyzer
------------
Enter Querystring: (C\++ AND C\#)  Searching for: +c +c Enter
Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET"
Searching for: (moss sharepoint) "asp net"
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org