You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by 基勇 <25...@qq.com> on 2013/08/30 05:55:06 UTC

re: How nutch2.2 to parse rss?

Thank you for providing the information to me!


------------------ 原始邮件 ------------------
发件人: "Tejas Patil"<te...@gmail.com>;
发送时间: 2013年8月30日(星期五) 中午11:52
收件人: "user@nutch.apache.org"<us...@nutch.apache.org>; 

主题: Re: How nutch2.2 to parse rss?



The 1.x RSS plugin works post this jira (
https://issues.apache.org/jira/browse/NUTCH-1494). There is open jira (
https://issues.apache.org/jira/browse/NUTCH-1515) for its 2.x counterpart



On Thu, Aug 29, 2013 at 8:10 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> yeah there is work to be done here for sure. there must be an issue open
> for this?
>
> On Thursday, August 29, 2013, Jonathan.Wei <25...@qq.com> wrote:
> > Thank's!
> > I try it!
> > But I have a felling that it will not build too!
> > Because some class file not find in nutch2.2!
> > example :
> > ParseData
> > ParseResult!
> >
> >
> >
> >
> > Thank you!
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "lewis john mcgibbney [via Lucene]"<
> ml-node+s472066n4087394h59@n3.nabble.com>;
> > 发送时间: 2013年8月30日(星期五) 上午9:34
> > 收件人: "基勇"<25...@qq.com>;
> >
> > 主题: Re: How nutch2.2 to parse rss?
> >
> >
> >
> >         Hi Jonathan,
> > This has been a long outstanding issue IIRC.
> > I have not used Nutch for feed crawling for a while if I am honest, and I
> > honestly can't recall when and if I have done it with 2.x.
> > You will see [0], that by default the plugin is not actually initialized.
> > So for starters you should uncomment the various targets within this file
> > [0] to get it working and to have it cleaned up etc.
> > You can then try building... but I have a feeling that it will not build.
> > Please check on our Jira for issues related to this... there may be
> patches
> > but I am not sure.
> > Kiran did some work a while back IIRC concerning getting following
> plugins
> > to compile and run
> >
> >      <ant dir="feed" target="deploy"/>
> >      <ant dir="parse-ext" target="deploy"/>
> >      <ant dir="parse-swf" target="deploy"/>
> >      <ant dir="parse-zip" target="deploy"/>
> >
> > But there is more work to be done.
> > Please keep us updated on this on. Sorry for late reply.
> >
> > [0]
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml
> >
> >
> > On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <[hidden email]> wrote:
> >
> >> Hello!Every body!
> >>      I want to use nutch2.2 to parse RSS !
> >>      But nutch2.x different with nutch1.x!So I down know how to parse
> >> rss!Can you help me?
> >>
> >>
> >> Use crawl command grab 24 URL, but the results suggest"Aborting with 10
> >> hung
> >> threads."
> >> log content is :
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s,
> 13
> >> URLs in 1 queues
> >> Aborting with 10 hung threads.
> >>
> >> What causes this?
> >>
> >> How I can fix it?
> >>
> >> Thank you!
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > *Lewis*
> >
> >
> >
> >                         If you reply to this email, your message will be
> added to the discussion below:
> >
>
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-and-Aborting-with-10-hung-threads-question-tp4087168p4087394.html
> >                                         To unsubscribe from How nutch2.2
> to parse rss? and "Aborting with 10 hung threads" question, click here.
> >                 NAML
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087399.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>
> --
> *Lewis*
>