You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sujatha Arun <su...@gmail.com> on 2009/01/06 10:25:55 UTC

Special characters

Hi,

Can anyone point me to the thread if it exists on indexing special
characters in solr.

Regards
Sujatha

Re: Special characters

Posted by Sujatha Arun <su...@gmail.com>.
Thanks.

When i give  the uriencoding=utf8 in tomcat's server.xml file some of the
special chars are indexed and searchable ,while others are not.

eg: Bernhard Schölkopf ,János Kornai

These are indexed and searchable after the above change.On the browser
however some others display as junk chars .The encoding of browser is utf-8

Regards
Sujatha
On Tue, Jan 6, 2009 at 3:28 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Kumar's advice is sound. You must make sure you are actually indexing the
> special symbols.
>
> To make a query with special characters you must make sure you urlencode
> the
> parameters before sending them to Solr.
>
> There are some symbols which have a special meaning in the lucene query
> syntax are '+', '-', ':' which you will have to escape by adding a
> backslash
> in front of it.
>
> On Tue, Jan 6, 2009 at 3:15 PM, Jana, Kumar Raja <kj...@ptc.com> wrote:
>
> > Filtering of special characters depends on the filters you use for the
> > fields in your schema.xml.
> >
> > If you are using WordDelimiterFilterFactory in your analyzer then the
> > special characters get removed during the processing of your field. But
> > the WordDelimiterFilterFactory does a lot of other things too than just
> > removing the special characters. If you feel that you can do away with
> > the other features provided by the filter then you can remove it from
> > your schema.xml file. In any other case, I guess you will have to
> > customize the WordDelimiterFilter.java class to suit your purpose.
> >
> > -Kumar
> >
> >
> >
> > -----Original Message-----
> > From: Sujatha Arun [mailto:suja.arun@gmail.com]
> > Sent: Tuesday, January 06, 2009 3:05 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Special characters
> >
> > Hi,
> >
> > I would like to query terms containing special chars .
> >
> > Regards
> > Sujatha
> >
> >
> >
> >
> > On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar <
> > shalinmangar@gmail.com> wrote:
> >
> > > You forgot to tell us what do you want to do with special characters?
> > >
> > > 1. Remove them from the documents while indexing?
> > > 2. Don't remove them while indexing?
> > > 3. Query with terms containing a special character?
> > >
> > > On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun <su...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can anyone point me to the thread if it exists on indexing special
> > > > characters in solr.
> > > >
> > > > Regards
> > > > Sujatha
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Special characters

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Kumar's advice is sound. You must make sure you are actually indexing the
special symbols.

To make a query with special characters you must make sure you urlencode the
parameters before sending them to Solr.

There are some symbols which have a special meaning in the lucene query
syntax are '+', '-', ':' which you will have to escape by adding a backslash
in front of it.

On Tue, Jan 6, 2009 at 3:15 PM, Jana, Kumar Raja <kj...@ptc.com> wrote:

> Filtering of special characters depends on the filters you use for the
> fields in your schema.xml.
>
> If you are using WordDelimiterFilterFactory in your analyzer then the
> special characters get removed during the processing of your field. But
> the WordDelimiterFilterFactory does a lot of other things too than just
> removing the special characters. If you feel that you can do away with
> the other features provided by the filter then you can remove it from
> your schema.xml file. In any other case, I guess you will have to
> customize the WordDelimiterFilter.java class to suit your purpose.
>
> -Kumar
>
>
>
> -----Original Message-----
> From: Sujatha Arun [mailto:suja.arun@gmail.com]
> Sent: Tuesday, January 06, 2009 3:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Special characters
>
> Hi,
>
> I would like to query terms containing special chars .
>
> Regards
> Sujatha
>
>
>
>
> On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
> > You forgot to tell us what do you want to do with special characters?
> >
> > 1. Remove them from the documents while indexing?
> > 2. Don't remove them while indexing?
> > 3. Query with terms containing a special character?
> >
> > On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun <su...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > Can anyone point me to the thread if it exists on indexing special
> > > characters in solr.
> > >
> > > Regards
> > > Sujatha
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

RE: Special characters

Posted by "Jana, Kumar Raja" <kj...@ptc.com>.
Filtering of special characters depends on the filters you use for the
fields in your schema.xml.

If you are using WordDelimiterFilterFactory in your analyzer then the
special characters get removed during the processing of your field. But
the WordDelimiterFilterFactory does a lot of other things too than just
removing the special characters. If you feel that you can do away with
the other features provided by the filter then you can remove it from
your schema.xml file. In any other case, I guess you will have to
customize the WordDelimiterFilter.java class to suit your purpose.

-Kumar



-----Original Message-----
From: Sujatha Arun [mailto:suja.arun@gmail.com] 
Sent: Tuesday, January 06, 2009 3:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Special characters

Hi,

I would like to query terms containing special chars .

Regards
Sujatha




On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> You forgot to tell us what do you want to do with special characters?
>
> 1. Remove them from the documents while indexing?
> 2. Don't remove them while indexing?
> 3. Query with terms containing a special character?
>
> On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun <su...@gmail.com>
wrote:
>
> > Hi,
> >
> > Can anyone point me to the thread if it exists on indexing special
> > characters in solr.
> >
> > Regards
> > Sujatha
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Special characters

Posted by Sujatha Arun <su...@gmail.com>.
Hi,

I would like to query terms containing special chars .

Regards
Sujatha




On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> You forgot to tell us what do you want to do with special characters?
>
> 1. Remove them from the documents while indexing?
> 2. Don't remove them while indexing?
> 3. Query with terms containing a special character?
>
> On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun <su...@gmail.com> wrote:
>
> > Hi,
> >
> > Can anyone point me to the thread if it exists on indexing special
> > characters in solr.
> >
> > Regards
> > Sujatha
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Special characters

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
You forgot to tell us what do you want to do with special characters?

1. Remove them from the documents while indexing?
2. Don't remove them while indexing?
3. Query with terms containing a special character?

On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun <su...@gmail.com> wrote:

> Hi,
>
> Can anyone point me to the thread if it exists on indexing special
> characters in solr.
>
> Regards
> Sujatha
>



-- 
Regards,
Shalin Shekhar Mangar.