You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by wu fuheng <wu...@gmail.com> on 2005/05/23 21:20:00 UTC

Can I build CJK application based no Nutch?

Dear all,
I think Nutch is a good wrapper for Lucene and with a good crawler.
Now if I want to build some Chinese/Japan/Korean Language search
application. Should I start from Lucene or Nutch? How Nutch does
support CJK application?
Sincerely your,
Simon

Re: Can I build CJK application based no Nutch?

Posted by Jack Tang <hi...@gmail.com>.
Sorry, I am wrong. It is still broken in svn. I tried to merge bi-gram
segmentation into NutchAnalysis.jj. It seems hard and will take a lot
of time. Can someone working on CJK thread give me some advice ?


/Jack

On 6/2/05, Jack Tang <hi...@gmail.com> wrote:
> Hi Tian&Wu
> 
> I suppose nutch now supports CJK bi-gram segmentation now.
> 
> /Jack
> 
> On 5/25/05, Transbuerg Tian <ac...@gmail.com> wrote:
> > hi, wufuheng,
> >
> > first:
> > if you are using lucene or nutch for indexing chinese content,
> > I recommend weblucene for you , you could get more info at :
> > http://www.chedong.com .
> > second:
> > cjk sentence split is quite different , for chinese , the very famous is use
> >
> > ICTCLAS , you could search it at google,
> >
> > and I write a chinese sentence spliter , by java, c sharp ,both.
> >
> > you can get that at: http://www.domolo.com/tec/index.htm
> > or write a letter to : xiaodingdong@gmail.com
> >
> > hope this will help you.
> >
> > transbuerg tian
> > beijing,china
> > http://www.domolo.com
> >
> >
> >
> >
> > 2005/5/24, wu fuheng <wu...@gmail.com>:
> > >
> > > Dear all,
> > > I think Nutch is a good wrapper for Lucene and with a good crawler.
> > > Now if I want to build some Chinese/Japan/Korean Language search
> > > application. Should I start from Lucene or Nutch? How Nutch does
> > > support CJK application?
> > > Sincerely your,
> > > Simon
> > >
> >
> >
>

Re: Can I build CJK application based no Nutch?

Posted by Jack Tang <hi...@gmail.com>.
Hi Tian&Wu

I suppose nutch now supports CJK bi-gram segmentation now.

/Jack

On 5/25/05, Transbuerg Tian <ac...@gmail.com> wrote:
> hi, wufuheng,
> 
> first:
> if you are using lucene or nutch for indexing chinese content,
> I recommend weblucene for you , you could get more info at :
> http://www.chedong.com .
> second:
> cjk sentence split is quite different , for chinese , the very famous is use
> 
> ICTCLAS , you could search it at google,
> 
> and I write a chinese sentence spliter , by java, c sharp ,both.
> 
> you can get that at: http://www.domolo.com/tec/index.htm
> or write a letter to : xiaodingdong@gmail.com
> 
> hope this will help you.
> 
> transbuerg tian
> beijing,china
> http://www.domolo.com
> 
> 
> 
> 
> 2005/5/24, wu fuheng <wu...@gmail.com>:
> >
> > Dear all,
> > I think Nutch is a good wrapper for Lucene and with a good crawler.
> > Now if I want to build some Chinese/Japan/Korean Language search
> > application. Should I start from Lucene or Nutch? How Nutch does
> > support CJK application?
> > Sincerely your,
> > Simon
> >
> 
>

Re: Can I build CJK application based no Nutch?

Posted by Transbuerg Tian <ac...@gmail.com>.
hi, wufuheng,

first:
if you are using lucene or nutch for indexing chinese content,
I recommend weblucene for you , you could get more info at : 
http://www.chedong.com .
second:
cjk sentence split is quite different , for chinese , the very famous is use 

ICTCLAS , you could search it at google,

and I write a chinese sentence spliter , by java, c sharp ,both.

you can get that at: http://www.domolo.com/tec/index.htm
or write a letter to : xiaodingdong@gmail.com

hope this will help you.

transbuerg tian
beijing,china
http://www.domolo.com




2005/5/24, wu fuheng <wu...@gmail.com>:
> 
> Dear all,
> I think Nutch is a good wrapper for Lucene and with a good crawler.
> Now if I want to build some Chinese/Japan/Korean Language search
> application. Should I start from Lucene or Nutch? How Nutch does
> support CJK application?
> Sincerely your,
> Simon
>

Re: Can I build CJK application based no Nutch?

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi,
I can not answer your question but may this is a interesting reading  
for you.
http://issues.apache.org/jira/browse/NUTCH-36

Stefan

Am 23.05.2005 um 21:20 schrieb wu fuheng:

> Dear all,
> I think Nutch is a good wrapper for Lucene and with a good crawler.
> Now if I want to build some Chinese/Japan/Korean Language search
> application. Should I start from Lucene or Nutch? How Nutch does
> support CJK application?
> Sincerely your,
> Simon
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net