You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Jonathan.Wei" <25...@qq.com> on 2013/08/30 03:58:34 UTC

回复: How nutch2.2 to parse rss?

Thank's!
I try it!
But I have a felling that it will not build too!
Because some class file not find in nutch2.2!
example :
ParseData
ParseResult!




Thank you!




------------------ 原始邮件 ------------------
发件人: "lewis john mcgibbney [via Lucene]"<ml...@n3.nabble.com>;
发送时间: 2013年8月30日(星期五) 上午9:34
收件人: "基勇"<25...@qq.com>; 

主题: Re: How nutch2.2 to parse rss?



 	Hi Jonathan, 
This has been a long outstanding issue IIRC. 
I have not used Nutch for feed crawling for a while if I am honest, and I 
honestly can't recall when and if I have done it with 2.x. 
You will see [0], that by default the plugin is not actually initialized. 
So for starters you should uncomment the various targets within this file 
[0] to get it working and to have it cleaned up etc. 
You can then try building... but I have a feeling that it will not build. 
Please check on our Jira for issues related to this... there may be patches 
but I am not sure. 
Kiran did some work a while back IIRC concerning getting following plugins 
to compile and run 

     <ant dir="feed" target="deploy"/> 
     <ant dir="parse-ext" target="deploy"/> 
     <ant dir="parse-swf" target="deploy"/> 
     <ant dir="parse-zip" target="deploy"/> 

But there is more work to be done. 
Please keep us updated on this on. Sorry for late reply. 

[0] http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml


On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <[hidden email]> wrote: 

> Hello!Every body! 
>      I want to use nutch2.2 to parse RSS ! 
>      But nutch2.x different with nutch1.x!So I down know how to parse 
> rss!Can you help me? 
> 
> 
> Use crawl command grab 24 URL, but the results suggest"Aborting with 10 
> hung 
> threads." 
> log content is : 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s, 13 
> URLs in 1 queues 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s, 13 
> URLs in 1 queues 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s, 13 
> URLs in 1 queues 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s, 13 
> URLs in 1 queues 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s, 13 
> URLs in 1 queues 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s, 13 
> URLs in 1 queues 
> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s, 13 
> URLs in 1 queues 
> Aborting with 10 hung threads. 
> 
> What causes this? 
> 
> How I can fix it? 
> 
> Thank you! 
> 
> 
> 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
> Sent from the Nutch - User mailing list archive at Nabble.com. 
> 



--  
*Lewis* 
 	 	 	 	
 	
 	
 	 		If you reply to this email, your message will be added to the discussion below:
 		http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-and-Aborting-with-10-hung-threads-question-tp4087168p4087394.html 	
 	 		 		To unsubscribe from How nutch2.2 to parse rss? and "Aborting with 10 hung threads" question, click here.
 		NAML



--
View this message in context: http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087399.html
Sent from the Nutch - User mailing list archive at Nabble.com.

回复: How nutch2.2 to parse rss?

Posted by 基勇 <25...@qq.com>.
Thank's!
I know!
Very nice of you to help me!




------------------ 原始邮件 ------------------
发件人: "Lewis John Mcgibbney"<le...@gmail.com>;
发送时间: 2013年8月30日(星期五) 中午11:10
收件人: "user@nutch.apache.org"<us...@nutch.apache.org>; 

主题: Re: How nutch2.2 to parse rss?



yeah there is work to be done here for sure. there must be an issue open
for this?

On Thursday, August 29, 2013, Jonathan.Wei <25...@qq.com> wrote:
> Thank's!
> I try it!
> But I have a felling that it will not build too!
> Because some class file not find in nutch2.2!
> example :
> ParseData
> ParseResult!
>
>
>
>
> Thank you!
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "lewis john mcgibbney [via Lucene]"<
ml-node+s472066n4087394h59@n3.nabble.com>;
> 发送时间: 2013年8月30日(星期五) 上午9:34
> 收件人: "基勇"<25...@qq.com>;
>
> 主题: Re: How nutch2.2 to parse rss?
>
>
>
>         Hi Jonathan,
> This has been a long outstanding issue IIRC.
> I have not used Nutch for feed crawling for a while if I am honest, and I
> honestly can't recall when and if I have done it with 2.x.
> You will see [0], that by default the plugin is not actually initialized.
> So for starters you should uncomment the various targets within this file
> [0] to get it working and to have it cleaned up etc.
> You can then try building... but I have a feeling that it will not build.
> Please check on our Jira for issues related to this... there may be
patches
> but I am not sure.
> Kiran did some work a while back IIRC concerning getting following plugins
> to compile and run
>
>      <ant dir="feed" target="deploy"/>
>      <ant dir="parse-ext" target="deploy"/>
>      <ant dir="parse-swf" target="deploy"/>
>      <ant dir="parse-zip" target="deploy"/>
>
> But there is more work to be done.
> Please keep us updated on this on. Sorry for late reply.
>
> [0]
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml
>
>
> On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <[hidden email]> wrote:
>
>> Hello!Every body!
>>      I want to use nutch2.2 to parse RSS !
>>      But nutch2.x different with nutch1.x!So I down know how to parse
>> rss!Can you help me?
>>
>>
>> Use crawl command grab 24 URL, but the results suggest"Aborting with 10
>> hung
>> threads."
>> log content is :
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s,
13
>> URLs in 1 queues
>> Aborting with 10 hung threads.
>>
>> What causes this?
>>
>> How I can fix it?
>>
>> Thank you!
>>
>>
>>
>>
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> *Lewis*
>
>
>
>                         If you reply to this email, your message will be
added to the discussion below:
>
http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-and-Aborting-with-10-hung-threads-question-tp4087168p4087394.html
>                                         To unsubscribe from How nutch2.2
to parse rss? and "Aborting with 10 hung threads" question, click here.
>                 NAML
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087399.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

-- 
*Lewis*

re: How nutch2.2 to parse rss?

Posted by 基勇 <25...@qq.com>.
Thank you for providing the information to me!


------------------ 原始邮件 ------------------
发件人: "Tejas Patil"<te...@gmail.com>;
发送时间: 2013年8月30日(星期五) 中午11:52
收件人: "user@nutch.apache.org"<us...@nutch.apache.org>; 

主题: Re: How nutch2.2 to parse rss?



The 1.x RSS plugin works post this jira (
https://issues.apache.org/jira/browse/NUTCH-1494). There is open jira (
https://issues.apache.org/jira/browse/NUTCH-1515) for its 2.x counterpart



On Thu, Aug 29, 2013 at 8:10 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> yeah there is work to be done here for sure. there must be an issue open
> for this?
>
> On Thursday, August 29, 2013, Jonathan.Wei <25...@qq.com> wrote:
> > Thank's!
> > I try it!
> > But I have a felling that it will not build too!
> > Because some class file not find in nutch2.2!
> > example :
> > ParseData
> > ParseResult!
> >
> >
> >
> >
> > Thank you!
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "lewis john mcgibbney [via Lucene]"<
> ml-node+s472066n4087394h59@n3.nabble.com>;
> > 发送时间: 2013年8月30日(星期五) 上午9:34
> > 收件人: "基勇"<25...@qq.com>;
> >
> > 主题: Re: How nutch2.2 to parse rss?
> >
> >
> >
> >         Hi Jonathan,
> > This has been a long outstanding issue IIRC.
> > I have not used Nutch for feed crawling for a while if I am honest, and I
> > honestly can't recall when and if I have done it with 2.x.
> > You will see [0], that by default the plugin is not actually initialized.
> > So for starters you should uncomment the various targets within this file
> > [0] to get it working and to have it cleaned up etc.
> > You can then try building... but I have a feeling that it will not build.
> > Please check on our Jira for issues related to this... there may be
> patches
> > but I am not sure.
> > Kiran did some work a while back IIRC concerning getting following
> plugins
> > to compile and run
> >
> >      <ant dir="feed" target="deploy"/>
> >      <ant dir="parse-ext" target="deploy"/>
> >      <ant dir="parse-swf" target="deploy"/>
> >      <ant dir="parse-zip" target="deploy"/>
> >
> > But there is more work to be done.
> > Please keep us updated on this on. Sorry for late reply.
> >
> > [0]
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml
> >
> >
> > On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <[hidden email]> wrote:
> >
> >> Hello!Every body!
> >>      I want to use nutch2.2 to parse RSS !
> >>      But nutch2.x different with nutch1.x!So I down know how to parse
> >> rss!Can you help me?
> >>
> >>
> >> Use crawl command grab 24 URL, but the results suggest"Aborting with 10
> >> hung
> >> threads."
> >> log content is :
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s,
> 13
> >> URLs in 1 queues
> >> Aborting with 10 hung threads.
> >>
> >> What causes this?
> >>
> >> How I can fix it?
> >>
> >> Thank you!
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > *Lewis*
> >
> >
> >
> >                         If you reply to this email, your message will be
> added to the discussion below:
> >
>
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-and-Aborting-with-10-hung-threads-question-tp4087168p4087394.html
> >                                         To unsubscribe from How nutch2.2
> to parse rss? and "Aborting with 10 hung threads" question, click here.
> >                 NAML
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087399.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>
> --
> *Lewis*
>

Re: How nutch2.2 to parse rss?

Posted by Tejas Patil <te...@gmail.com>.
The 1.x RSS plugin works post this jira (
https://issues.apache.org/jira/browse/NUTCH-1494). There is open jira (
https://issues.apache.org/jira/browse/NUTCH-1515) for its 2.x counterpart



On Thu, Aug 29, 2013 at 8:10 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> yeah there is work to be done here for sure. there must be an issue open
> for this?
>
> On Thursday, August 29, 2013, Jonathan.Wei <25...@qq.com> wrote:
> > Thank's!
> > I try it!
> > But I have a felling that it will not build too!
> > Because some class file not find in nutch2.2!
> > example :
> > ParseData
> > ParseResult!
> >
> >
> >
> >
> > Thank you!
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "lewis john mcgibbney [via Lucene]"<
> ml-node+s472066n4087394h59@n3.nabble.com>;
> > 发送时间: 2013年8月30日(星期五) 上午9:34
> > 收件人: "基勇"<25...@qq.com>;
> >
> > 主题: Re: How nutch2.2 to parse rss?
> >
> >
> >
> >         Hi Jonathan,
> > This has been a long outstanding issue IIRC.
> > I have not used Nutch for feed crawling for a while if I am honest, and I
> > honestly can't recall when and if I have done it with 2.x.
> > You will see [0], that by default the plugin is not actually initialized.
> > So for starters you should uncomment the various targets within this file
> > [0] to get it working and to have it cleaned up etc.
> > You can then try building... but I have a feeling that it will not build.
> > Please check on our Jira for issues related to this... there may be
> patches
> > but I am not sure.
> > Kiran did some work a while back IIRC concerning getting following
> plugins
> > to compile and run
> >
> >      <ant dir="feed" target="deploy"/>
> >      <ant dir="parse-ext" target="deploy"/>
> >      <ant dir="parse-swf" target="deploy"/>
> >      <ant dir="parse-zip" target="deploy"/>
> >
> > But there is more work to be done.
> > Please keep us updated on this on. Sorry for late reply.
> >
> > [0]
> http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml
> >
> >
> > On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <[hidden email]> wrote:
> >
> >> Hello!Every body!
> >>      I want to use nutch2.2 to parse RSS !
> >>      But nutch2.x different with nutch1.x!So I down know how to parse
> >> rss!Can you help me?
> >>
> >>
> >> Use crawl command grab 24 URL, but the results suggest"Aborting with 10
> >> hung
> >> threads."
> >> log content is :
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s,
> 13
> >> URLs in 1 queues
> >> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s,
> 13
> >> URLs in 1 queues
> >> Aborting with 10 hung threads.
> >>
> >> What causes this?
> >>
> >> How I can fix it?
> >>
> >> Thank you!
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > *Lewis*
> >
> >
> >
> >                         If you reply to this email, your message will be
> added to the discussion below:
> >
>
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-and-Aborting-with-10-hung-threads-question-tp4087168p4087394.html
> >                                         To unsubscribe from How nutch2.2
> to parse rss? and "Aborting with 10 hung threads" question, click here.
> >                 NAML
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087399.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>
> --
> *Lewis*
>

Re: How nutch2.2 to parse rss?

Posted by Lewis John Mcgibbney <le...@gmail.com>.
yeah there is work to be done here for sure. there must be an issue open
for this?

On Thursday, August 29, 2013, Jonathan.Wei <25...@qq.com> wrote:
> Thank's!
> I try it!
> But I have a felling that it will not build too!
> Because some class file not find in nutch2.2!
> example :
> ParseData
> ParseResult!
>
>
>
>
> Thank you!
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "lewis john mcgibbney [via Lucene]"<
ml-node+s472066n4087394h59@n3.nabble.com>;
> 发送时间: 2013年8月30日(星期五) 上午9:34
> 收件人: "基勇"<25...@qq.com>;
>
> 主题: Re: How nutch2.2 to parse rss?
>
>
>
>         Hi Jonathan,
> This has been a long outstanding issue IIRC.
> I have not used Nutch for feed crawling for a while if I am honest, and I
> honestly can't recall when and if I have done it with 2.x.
> You will see [0], that by default the plugin is not actually initialized.
> So for starters you should uncomment the various targets within this file
> [0] to get it working and to have it cleaned up etc.
> You can then try building... but I have a feeling that it will not build.
> Please check on our Jira for issues related to this... there may be
patches
> but I am not sure.
> Kiran did some work a while back IIRC concerning getting following plugins
> to compile and run
>
>      <ant dir="feed" target="deploy"/>
>      <ant dir="parse-ext" target="deploy"/>
>      <ant dir="parse-swf" target="deploy"/>
>      <ant dir="parse-zip" target="deploy"/>
>
> But there is more work to be done.
> Please keep us updated on this on. Sorry for late reply.
>
> [0]
http://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/build.xml
>
>
> On Thu, Aug 29, 2013 at 1:29 AM, Jonathan.Wei <[hidden email]> wrote:
>
>> Hello!Every body!
>>      I want to use nutch2.2 to parse RSS !
>>      But nutch2.x different with nutch1.x!So I down know how to parse
>> rss!Can you help me?
>>
>>
>> Use crawl command grab 24 URL, but the results suggest"Aborting with 10
>> hung
>> threads."
>> log content is :
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 466 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 461 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 455 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 450 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 444 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 439 0 kb/s,
13
>> URLs in 1 queues
>> 0/10 spinwaiting/active, 11 pages, 0 errors, 0.0 0 pages/s, 434 0 kb/s,
13
>> URLs in 1 queues
>> Aborting with 10 hung threads.
>>
>> What causes this?
>>
>> How I can fix it?
>>
>> Thank you!
>>
>>
>>
>>
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087168.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> *Lewis*
>
>
>
>                         If you reply to this email, your message will be
added to the discussion below:
>
http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-and-Aborting-with-10-hung-threads-question-tp4087168p4087394.html
>                                         To unsubscribe from How nutch2.2
to parse rss? and "Aborting with 10 hung threads" question, click here.
>                 NAML
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/How-nutch2-2-to-parse-rss-tp4087399.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

-- 
*Lewis*