You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bing Li <lb...@gmail.com> on 2011/01/18 19:30:37 UTC

Indexing and Searching Chinese with SolrNet

Dear all,

After reading some pages on the Web, I created the index with the following
schema.

......
                <fieldtype name="text" class="solr.TextField"
positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer
class="solr.ChineseTokenizerFactory"/>
                        </analyzer>
                </fieldtype>
......

It must be correct, right? However, when sending a query though SolrNet, no
results are returned. Could you tell me what the reason is?

Thanks,
LB

Re: Indexing and Searching Chinese with SolrNet

Posted by Bing Li <lb...@gmail.com>.
Dear Jelsma,

After configuring the Tomcat URIEncoding, Chinese characters can be
processed correctly. I appreciate so much for your help!

Best,
LB

On Wed, Jan 19, 2011 at 3:02 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Hi,
>
> Yes but Tomcat might need to be configured to accept, see the wiki for more
> information on this subject.
>
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
>
> Cheers,
>
> > Dear Jelsma,
> >
> > My servlet container is Tomcat 7. I think it should accept Chinese
> > characters. But I am not sure how to configure it. From the console of
> > Tomcat, I saw that the Chinese characters in the query are not displayed
> > normally. However, it is fine in the Solr Admin page.
> >
> > I am not sure either if SolrNet supports Chinese. If not, how can I
> > interact with Solr on .NET?
> >
> > Thanks so much!
> > LB
> >
> >
> > On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma
> >
> > <ma...@openindex.io>wrote:
> > > Why creating two threads for the same problem? Anyway, is your servlet
> > > container capable of accepting UTF-8 in the URL? Also, is SolrNet
> capable
> > > of
> > > handling those characters? To confirm, try a tool like curl.
> > >
> > > > Dear all,
> > > >
> > > > After reading some pages on the Web, I created the index with the
> > >
> > > following
> > >
> > > > schema.
> > > >
> > > > ......
> > > >
> > > >                 <fieldtype name="text" class="solr.TextField"
> > > >
> > > > positionIncrementGap="100">
> > > >
> > > >                         <analyzer type="index">
> > > >
> > > >                                 <tokenizer
> > > >
> > > > class="solr.ChineseTokenizerFactory"/>
> > > >
> > > >                         </analyzer>
> > > >
> > > >                 </fieldtype>
> > > >
> > > > ......
> > > >
> > > > It must be correct, right? However, when sending a query though
> > > > SolrNet,
> > >
> > > no
> > >
> > > > results are returned. Could you tell me what the reason is?
> > > >
> > > > Thanks,
> > > > LB
>

Re: Indexing and Searching Chinese with SolrNet

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

Yes but Tomcat might need to be configured to accept, see the wiki for more 
information on this subject.

http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Cheers,

> Dear Jelsma,
> 
> My servlet container is Tomcat 7. I think it should accept Chinese
> characters. But I am not sure how to configure it. From the console of
> Tomcat, I saw that the Chinese characters in the query are not displayed
> normally. However, it is fine in the Solr Admin page.
> 
> I am not sure either if SolrNet supports Chinese. If not, how can I
> interact with Solr on .NET?
> 
> Thanks so much!
> LB
> 
> 
> On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma
> 
> <ma...@openindex.io>wrote:
> > Why creating two threads for the same problem? Anyway, is your servlet
> > container capable of accepting UTF-8 in the URL? Also, is SolrNet capable
> > of
> > handling those characters? To confirm, try a tool like curl.
> > 
> > > Dear all,
> > > 
> > > After reading some pages on the Web, I created the index with the
> > 
> > following
> > 
> > > schema.
> > > 
> > > ......
> > > 
> > >                 <fieldtype name="text" class="solr.TextField"
> > > 
> > > positionIncrementGap="100">
> > > 
> > >                         <analyzer type="index">
> > >                         
> > >                                 <tokenizer
> > > 
> > > class="solr.ChineseTokenizerFactory"/>
> > > 
> > >                         </analyzer>
> > >                 
> > >                 </fieldtype>
> > > 
> > > ......
> > > 
> > > It must be correct, right? However, when sending a query though
> > > SolrNet,
> > 
> > no
> > 
> > > results are returned. Could you tell me what the reason is?
> > > 
> > > Thanks,
> > > LB

Re: Indexing and Searching Chinese with SolrNet

Posted by Bing Li <lb...@gmail.com>.
Dear Jelsma,

My servlet container is Tomcat 7. I think it should accept Chinese
characters. But I am not sure how to configure it. From the console of
Tomcat, I saw that the Chinese characters in the query are not displayed
normally. However, it is fine in the Solr Admin page.

I am not sure either if SolrNet supports Chinese. If not, how can I interact
with Solr on .NET?

Thanks so much!
LB


On Wed, Jan 19, 2011 at 2:34 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Why creating two threads for the same problem? Anyway, is your servlet
> container capable of accepting UTF-8 in the URL? Also, is SolrNet capable
> of
> handling those characters? To confirm, try a tool like curl.
>
> > Dear all,
> >
> > After reading some pages on the Web, I created the index with the
> following
> > schema.
> >
> > ......
> >                 <fieldtype name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >                         <analyzer type="index">
> >                                 <tokenizer
> > class="solr.ChineseTokenizerFactory"/>
> >                         </analyzer>
> >                 </fieldtype>
> > ......
> >
> > It must be correct, right? However, when sending a query though SolrNet,
> no
> > results are returned. Could you tell me what the reason is?
> >
> > Thanks,
> > LB
>

Re: Indexing and Searching Chinese with SolrNet

Posted by Markus Jelsma <ma...@openindex.io>.
Why creating two threads for the same problem? Anyway, is your servlet 
container capable of accepting UTF-8 in the URL? Also, is SolrNet capable of 
handling those characters? To confirm, try a tool like curl.

> Dear all,
> 
> After reading some pages on the Web, I created the index with the following
> schema.
> 
> ......
>                 <fieldtype name="text" class="solr.TextField"
> positionIncrementGap="100">
>                         <analyzer type="index">
>                                 <tokenizer
> class="solr.ChineseTokenizerFactory"/>
>                         </analyzer>
>                 </fieldtype>
> ......
> 
> It must be correct, right? However, when sending a query though SolrNet, no
> results are returned. Could you tell me what the reason is?
> 
> Thanks,
> LB

Re: Indexing and Searching Chinese with SolrNet

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Make sure your browser is set to UTF-8 encoding.



----- Original Message ----
From: Otis Gospodnetic <ot...@yahoo.com>
To: solr-user@lucene.apache.org; bing.li@asu.edu
Sent: Tue, January 18, 2011 10:39:16 AM
Subject: Re: Indexing and Searching Chinese with SolrNet

Bing Li,

Go to your Solr Admin page and use the Analysis functionality there to enter 
some Chinese text and see how it's getting analyzed at index and at search 
time.  This will tell you what is (or isn't) going on.
Here it looks like you just defined index-time analysis, so you should see your 
index-time analysis look very different from your query-time analysis.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Bing Li <lb...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 18, 2011 1:30:37 PM
> Subject: Indexing and Searching Chinese with SolrNet
> 
> Dear all,
> 
> After reading some pages on the Web, I created the index with  the following
> schema.
> 
> ......
>                  <fieldtype name="text"  class="solr.TextField"
> positionIncrementGap="100">
>                          <analyzer  type="index">
>                                   <tokenizer
> class="solr.ChineseTokenizerFactory"/>
>                           </analyzer>
>                  </fieldtype>
> ......
> 
> It must be correct, right? However, when  sending a query though SolrNet, no
> results are returned. Could you tell me  what the reason is?
> 
> Thanks,
> LB
> 

Re: Indexing and Searching Chinese with SolrNet

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Bing Li,

Go to your Solr Admin page and use the Analysis functionality there to enter 
some Chinese text and see how it's getting analyzed at index and at search 
time.  This will tell you what is (or isn't) going on.
Here it looks like you just defined index-time analysis, so you should see your 
index-time analysis look very different from your query-time analysis.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Bing Li <lb...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 18, 2011 1:30:37 PM
> Subject: Indexing and Searching Chinese with SolrNet
> 
> Dear all,
> 
> After reading some pages on the Web, I created the index with  the following
> schema.
> 
> ......
>                  <fieldtype name="text"  class="solr.TextField"
> positionIncrementGap="100">
>                          <analyzer  type="index">
>                                   <tokenizer
> class="solr.ChineseTokenizerFactory"/>
>                           </analyzer>
>                  </fieldtype>
> ......
> 
> It must be correct, right? However, when  sending a query though SolrNet, no
> results are returned. Could you tell me  what the reason is?
> 
> Thanks,
> LB
>