You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Thomas Johnson <tj...@paperhost.com> on 2016/11/28 16:06:53 UTC

How to ignore a ,

We are using Lucene 5.0. Some of our documents are getting indexed with a comma after the value. For example "John Doe, bob smith, and jane go into a bar."  We are using a WhitespaceTokenizer and a  LowerCaseFilter as the analyzer. If we search for "Doe" nothing is found because the value in the index is "Doe," I was wondering if there was a way to get the reader to ignore the comma. The current work around is to have the user do their search with * at the end. This is slow and also returns unwanted values such as "Does" when we search for  "Doe*"

Thank you.

________________________________



Thomas W. Johnson, Senior Programmer
678-397-1663
tjohnson@paperhost.com<ma...@paperhost.com>


________________________________

[PaperHost]

[asdf]<http://bit.ly/PaperHost_Twitter>

Follow PaperHost on Twitter <http://bit.ly/PaperHost_Twitter>

[asdf]<http://bit.ly/PaperHost_FaceBook>

Become a Fan of PaperHost <http://bit.ly/PaperHost_FaceBook>

[cid:image005.png@01CA6902.F0682A90]<http://paperhost.blogspot.com/>

PaperHost Blog<http://paperhost.blogspot.com/>

[cid:image002.png@01CA6902.F0682A90]<http://www.linkedin.com/groups?homeNewMember=&gid=2468558>

PaperHost LinkedIn Discussion Group <http://www.linkedin.com/groups?homeNewMember=&gid=2468558>

LEGAL DISCLAIMER

The information transmitted is intended solely for the individual or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dis-semination or other use of or taking action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this email in error please contact the sender and delete the material from any computer.

Re: How to ignore a ,

Posted by Alan Woodward <al...@flax.co.uk>.

Using StandardTokenizer should remove punctuation as well.

Alan Woodward
www.flax.co.uk


> On 28 Nov 2016, at 16:06, Thomas Johnson <tj...@paperhost.com> wrote:
> 
> We are using Lucene 5.0. Some of our documents are getting indexed with a comma after the value. For example “John Doe, bob smith, and jane go into a bar.”  We are using a WhitespaceTokenizer and a  LowerCaseFilter as the analyzer. If we search for “Doe” nothing is found because the value in the index is “Doe,” I was wondering if there was a way to get the reader to ignore the comma. The current work around is to have the user do their search with * at the end. This is slow and also returns unwanted values such as “Does” when we search for  “Doe*”
>  
> Thank you.
>  
>  
> Thomas W. Johnson, Senior Programmer
> 678-397-1663
> tjohnson@paperhost.com <ma...@paperhost.com>		
> 	
>  <http://bit.ly/PaperHost_Twitter>	
> Follow PaperHost on Twitter <http://bit.ly/PaperHost_Twitter>
>  <http://bit.ly/PaperHost_FaceBook>	
> Become a Fan of PaperHost <http://bit.ly/PaperHost_FaceBook>
>  <http://paperhost.blogspot.com/>	
> PaperHost Blog <http://paperhost.blogspot.com/>
>  <http://www.linkedin.com/groups?homeNewMember=&gid=2468558>	
> PaperHost LinkedIn Discussion Group <http://www.linkedin.com/groups?homeNewMember=&gid=2468558>
> LEGAL DISCLAIMER
> 
> The information transmitted is intended solely for the individual or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dis-semination or other use of or taking action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this email in error please contact the sender and delete the material from any computer.