You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Kerang Lv (JIRA)" <ji...@apache.org> on 2005/09/22 14:57:27 UTC

[jira] Commented: (NUTCH-36) Chinese in Nutch

    [ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ] 

Kerang Lv commented on NUTCH-36:
--------------------------------

enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj
| <SIGRAM: <CJK><CJK> >
  {
    input_stream.backup(1);
  }


> Chinese in Nutch
> ----------------
>
>          Key: NUTCH-36
>          URL: http://issues.apache.org/jira/browse/NUTCH-36
>      Project: Nutch
>         Type: Improvement
>   Components: indexer, searcher
>  Environment: all
>     Reporter: Jack Tang
>     Priority: Minor
>  Attachments: &#26700
>
> Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term word-by-word. 
> So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we expect Nutch only highlights 'FooBar'.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Commented: (NUTCH-36) Chinese in Nutch

Posted by Jack Tang <hi...@gmail.com>.
Hi Kerang

I have test the query, no problem in summary highlight. It is really
amazing. It's the solution for Chinese bi-gram segmentation.

Regards
/Jack

On 9/22/05, Jack Tang <hi...@gmail.com> wrote:
> Hi Kerang
>
> Pretty nice hack!
> I will test highlight in query summary now...
> see you.
>
> /Jack
>
> On 9/22/05, Kerang Lv (JIRA) <ji...@apache.org> wrote:
> >     [ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ]
> >
> > Kerang Lv commented on NUTCH-36:
> > --------------------------------
> >
> > enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj
> > | <SIGRAM: <CJK><CJK> >
> >   {
> >     input_stream.backup(1);
> >   }
> >
> >
> > > Chinese in Nutch
> > > ----------------
> > >
> > >          Key: NUTCH-36
> > >          URL: http://issues.apache.org/jira/browse/NUTCH-36
> > >      Project: Nutch
> > >         Type: Improvement
> > >   Components: indexer, searcher
> > >  Environment: all
> > >     Reporter: Jack Tang
> > >     Priority: Minor
> > >  Attachments: &#26700
> > >
> > > Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term word-by-word.
> > > So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we expect Nutch only highlights 'FooBar'.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > If you think it was sent incorrectly contact one of the administrators:
> >    http://issues.apache.org/jira/secure/Administrators.jspa
> > -
> > For more information on JIRA, see:
> >    http://www.atlassian.com/software/jira
> >
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Re: [jira] Commented: (NUTCH-36) Chinese in Nutch

Posted by Jack Tang <hi...@gmail.com>.
Hi Kerang

Pretty nice hack!
I will test highlight in query summary now...
see you.

/Jack

On 9/22/05, Kerang Lv (JIRA) <ji...@apache.org> wrote:
>     [ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ]
>
> Kerang Lv commented on NUTCH-36:
> --------------------------------
>
> enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj
> | <SIGRAM: <CJK><CJK> >
>   {
>     input_stream.backup(1);
>   }
>
>
> > Chinese in Nutch
> > ----------------
> >
> >          Key: NUTCH-36
> >          URL: http://issues.apache.org/jira/browse/NUTCH-36
> >      Project: Nutch
> >         Type: Improvement
> >   Components: indexer, searcher
> >  Environment: all
> >     Reporter: Jack Tang
> >     Priority: Minor
> >  Attachments: &#26700
> >
> > Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term word-by-word.
> > So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we expect Nutch only highlights 'FooBar'.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars