You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Kerang Lv (JIRA)" <ji...@apache.org> on 2005/09/22 14:57:27 UTC
[jira] Commented: (NUTCH-36) Chinese in Nutch
[ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ]
Kerang Lv commented on NUTCH-36:
--------------------------------
enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj
| <SIGRAM: <CJK><CJK> >
{
input_stream.backup(1);
}
> Chinese in Nutch
> ----------------
>
> Key: NUTCH-36
> URL: http://issues.apache.org/jira/browse/NUTCH-36
> Project: Nutch
> Type: Improvement
> Components: indexer, searcher
> Environment: all
> Reporter: Jack Tang
> Priority: Minor
> Attachments: 桌
>
> Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term word-by-word.
> So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we expect Nutch only highlights 'FooBar'.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Re: [jira] Commented: (NUTCH-36) Chinese in Nutch
Posted by Jack Tang <hi...@gmail.com>.
Hi Kerang
I have test the query, no problem in summary highlight. It is really
amazing. It's the solution for Chinese bi-gram segmentation.
Regards
/Jack
On 9/22/05, Jack Tang <hi...@gmail.com> wrote:
> Hi Kerang
>
> Pretty nice hack!
> I will test highlight in query summary now...
> see you.
>
> /Jack
>
> On 9/22/05, Kerang Lv (JIRA) <ji...@apache.org> wrote:
> > [ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ]
> >
> > Kerang Lv commented on NUTCH-36:
> > --------------------------------
> >
> > enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj
> > | <SIGRAM: <CJK><CJK> >
> > {
> > input_stream.backup(1);
> > }
> >
> >
> > > Chinese in Nutch
> > > ----------------
> > >
> > > Key: NUTCH-36
> > > URL: http://issues.apache.org/jira/browse/NUTCH-36
> > > Project: Nutch
> > > Type: Improvement
> > > Components: indexer, searcher
> > > Environment: all
> > > Reporter: Jack Tang
> > > Priority: Minor
> > > Attachments: 桌
> > >
> > > Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term word-by-word.
> > > So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we expect Nutch only highlights 'FooBar'.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > If you think it was sent incorrectly contact one of the administrators:
> > http://issues.apache.org/jira/secure/Administrators.jspa
> > -
> > For more information on JIRA, see:
> > http://www.atlassian.com/software/jira
> >
> >
>
>
> --
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
Re: [jira] Commented: (NUTCH-36) Chinese in Nutch
Posted by Jack Tang <hi...@gmail.com>.
Hi Kerang
Pretty nice hack!
I will test highlight in query summary now...
see you.
/Jack
On 9/22/05, Kerang Lv (JIRA) <ji...@apache.org> wrote:
> [ http://issues.apache.org/jira/browse/NUTCH-36?page=comments#action_12330192 ]
>
> Kerang Lv commented on NUTCH-36:
> --------------------------------
>
> enghlitened by your last comment, the bi-gram segmentation could be done with the following in NutchAnalysis.jj
> | <SIGRAM: <CJK><CJK> >
> {
> input_stream.backup(1);
> }
>
>
> > Chinese in Nutch
> > ----------------
> >
> > Key: NUTCH-36
> > URL: http://issues.apache.org/jira/browse/NUTCH-36
> > Project: Nutch
> > Type: Improvement
> > Components: indexer, searcher
> > Environment: all
> > Reporter: Jack Tang
> > Priority: Minor
> > Attachments: 桌
> >
> > Nutch now support Chinese in very simple way: NutchAnalysis segments CJK term word-by-word.
> > So, if I search Chinese term 'FooBar'(two Chinese words: 'Foo' and 'Bar'), the result in web gui will highlight 'FooBar' and 'Foo', 'Bar'. While we expect Nutch only highlights 'FooBar'.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
>
>
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars