You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by KPK <ko...@gmail.com> on 2012/06/01 01:38:07 UTC

index special characters solr

Hi all
Can somebody please tell me how can I build an index in solr where one of my
field contains special characters like $ , % 
I would also like to search on the same characters on that particular field.

Any advice would be appreciated.

Thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index special characters solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
Thanks. I'm sure someone else will have the same issue at some point.

-- Jack Krupansky

-----Original Message----- 
From: KPK
Sent: Tuesday, June 05, 2012 9:51 PM
To: solr-user@lucene.apache.org
Subject: Re: index special characters solr

Thanks Jack for your help!
I found my mistake, rather than classifying those special characters as
ALPHA , I classified it as a DIGIT. Also I missed the same entry for search
analyzer. So probably that was the reason for not getting relevant results.

I spent a lot of time figuring this out. So I'll paste my code snippet of
schema.xml which was changed for newbies so that they dont waste so much
time in this.
I classified my field as text in which I wanted to search for keywords
including special characters. In fieldType definition modify the filter
class="solr.WordDelimiterFilterFactory"

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"
types="characters.txt" />

in BOTH  <analyzer type="index"> and  <analyzer type="query">

And make a new characters.txt in the same folder as schema.xml and add the
content :

$ => ALPHA
% => ALPHA


(i wanted $ and % to behave as alphabets so that they could be searched)


Then restart jetty/tomcat

This is how i solved this problem.
Hope this would help someone :)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987891.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: index special characters solr

Posted by KPK <ko...@gmail.com>.
Thanks Jack for your help!
I found my mistake, rather than classifying those special characters as
ALPHA , I classified it as a DIGIT. Also I missed the same entry for search
analyzer. So probably that was the reason for not getting relevant results.

I spent a lot of time figuring this out. So I'll paste my code snippet of
schema.xml which was changed for newbies so that they dont waste so much
time in this.
I classified my field as text in which I wanted to search for keywords
including special characters. In fieldType definition modify the filter
class="solr.WordDelimiterFilterFactory"

 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"
types="characters.txt" />
 
in BOTH  <analyzer type="index"> and  <analyzer type="query">

And make a new characters.txt in the same folder as schema.xml and add the
content :

 $ => ALPHA
 % => ALPHA


(i wanted $ and % to behave as alphabets so that they could be searched)


Then restart jetty/tomcat

This is how i solved this problem.
Hope this would help someone :)


--
View this message in context: http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index special characters solr

Posted by KPK <ko...@gmail.com>.
Thanks for your reply!

I tried using the types field in WordDelimiterFilterFactory wherein I was
passing a text file which contained % $ as alphabets. But even then it didnt
get indexed and neither did it show up in search results.
Am I missing something?

Thanks,
Kushal

--
View this message in context: http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157p3987888.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index special characters solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
Special characters are filtered out of (most) "text" fields, but are 
preserved in "string" fields. String fields might suit your needs, but are 
inconvenient for keyword searching.

You may be able to use the "types" option of the WordDelimiterFilterFactory 
to pass in a custom character type table that has the special characters 
treated as alphabetic characters. Otherwise, you may have to customize the 
code yourself.

-- Jack Krupansky

-----Original Message----- 
From: KPK
Sent: Thursday, May 31, 2012 7:38 PM
To: solr-user@lucene.apache.org
Subject: index special characters solr

Hi all
Can somebody please tell me how can I build an index in solr where one of my
field contains special characters like $ , %
I would also like to search on the same characters on that particular field.

Any advice would be appreciated.

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-special-characters-solr-tp3987157.html
Sent from the Solr - User mailing list archive at Nabble.com.