You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Jonathan.Wei" <25...@qq.com> on 2013/08/29 10:29:22 UTC

How nutch2.2 to parse rss?

Hello!Every body!
     I want to use nutch2.2 to parse RSS !
     But nutch2.x different with nutch1.x!So I down know how to parse
rss!Can you help me?


Use crawl command grab 24 URL, but the results suggest"Aborting with 10 hung
threads."
log content is :
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s, 13
URLs in 1 queues
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s, 13
URLs in 1 queues
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s, 13
URLs in 1 queues
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s, 13
URLs in 1 queues
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s, 13
URLs in 1 queues
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s, 13
URLs in 1 queues
0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s, 13
URLs in 1 queues
Aborting with 10 hung threads.

What causes this?

How I can fix it?

Thank you!




--
View this message in context: http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: How nutch2.2 to parse rss?

Posted by Tejas Patil <te...@gmail.com>.
AFAIK, the RSS plugin in 2.x ain't migrated.. i mean its code copied from
1.x trunk and would need modifications to get things working with 2.x.
Thats why it was disabled in the build file.



On Thu, Aug 29, 2013 at 6:34 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Jonathan,
> This has been a long outstanding issue IIRC.
> I have not used Nutch for feed crawling for a while if I am honest, and I
> honestly can't recall when and if I have done it with 2.x.
> You will see [0], that by default the plugin is not actually initialized.
> So for starters you should uncomment the various targets within this file
> [0] to get it working and to have it cleaned up etc.
> You can then try building... but I have a feeling that it will not build.
> Please check on our Jira for issues related to this... there may be patches
> but I am not sure.
> Kiran did some work a while back IIRC concerning getting following plugins
> to compile and run
>
>      <ant dir="feed" target="deploy"/>
>      <ant dir="parse-ext" target="deploy"/>
>      <ant dir="parse-swf" target="deploy"/>
>      <ant dir="parse-zip" target="deploy"/>
>
> But there is more work to be done.
> Please keep us updated on this on. Sorry for late reply.
>
> [0]
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml
>
>
> On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <25...@qq.com> wrote:
>
> > Hello!Every body!
> >      I want to use nutch2.2 to parse RSS !
> >      But nutch2.x different with nutch1.x!So I down know how to parse
> > rss!Can you help me?
> >
> >
> > Use crawl command grab 24 URL, but the results suggest"Aborting with 10
> > hung
> > threads."
> > log content is :
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s,
> 13
> > URLs in 1 queues
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s,
> 13
> > URLs in 1 queues
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s,
> 13
> > URLs in 1 queues
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s,
> 13
> > URLs in 1 queues
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s,
> 13
> > URLs in 1 queues
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s,
> 13
> > URLs in 1 queues
> > 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s,
> 13
> > URLs in 1 queues
> > Aborting with 10 hung threads.
> >
> > What causes this?
> >
> > How I can fix it?
> >
> > Thank you!
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> *Lewis*
>

Re: How nutch2.2 to parse rss?

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Jonathan,
This has been a long outstanding issue IIRC.
I have not used Nutch for feed crawling for a while if I am honest, and I
honestly can't recall when and if I have done it with 2.x.
You will see [0], that by default the plugin is not actually initialized.
So for starters you should uncomment the various targets within this file
[0] to get it working and to have it cleaned up etc.
You can then try building... but I have a feeling that it will not build.
Please check on our Jira for issues related to this... there may be patches
but I am not sure.
Kiran did some work a while back IIRC concerning getting following plugins
to compile and run

     <ant dir="feed" target="deploy"/>
     <ant dir="parse-ext" target="deploy"/>
     <ant dir="parse-swf" target="deploy"/>
     <ant dir="parse-zip" target="deploy"/>

But there is more work to be done.
Please keep us updated on this on. Sorry for late reply.

[0] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml


On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <25...@qq.com> wrote:

> Hello!Every body!
>      I want to use nutch2.2 to parse RSS !
>      But nutch2.x different with nutch1.x!So I down know how to parse
> rss!Can you help me?
>
>
> Use crawl command grab 24 URL, but the results suggest"Aborting with 10
> hung
> threads."
> log content is :
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s, 13
> URLs in 1 queues
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s, 13
> URLs in 1 queues
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s, 13
> URLs in 1 queues
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s, 13
> URLs in 1 queues
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s, 13
> URLs in 1 queues
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s, 13
> URLs in 1 queues
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s, 13
> URLs in 1 queues
> Aborting with 10 hung threads.
>
> What causes this?
>
> How I can fix it?
>
> Thank you!
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*