You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by anupamk <an...@usc.edu> on 2014/03/14 19:37:57 UTC
IOException while parsing
Hi,
I am fetching around 3000 links. I am able to fetch them successfully. But
when I try to parse them I get IOException as follows:
This is not at helpful to troubleshoot the problem.
Has anyone else run into such problem while parsing ?
I am using Nutch-1.7
I am guessing it's because nutch tries to parse a truncated mp3 file and
fails ? I am right ?
--
View this message in context: http://lucene.472066.n3.nabble.com/IOException-while-parsing-tp4123696.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: IOException while parsing
Posted by anupamk <an...@usc.edu>.
I got to getting around the problem a while back ... Just wanted to update
the forum with my work-around, in case anyone else is looking for a
solution.
The Apparently memory was the root of the issue. I don't know the internals
of parse yet. I have not looked at the code, but it seems to me that the
parser tries to span threads proportional to the number of documents in the
parser's queue. Again, I m not sure if I am 100% correct or not. I am just
guessing this based on the error message in hadoop.log.
The way I got it to work was to split the segments into smaller segments and
fetch and parse each smaller segment one by one.
I split the segments using the option.
(source: http://wiki.apache.org/nutch/bin/nutch_mergesegs)
I would love to take a closer look at the parser soon and come back with a
better answer. But for now, this works and gets the job done.
--
View this message in context: http://lucene.472066.n3.nabble.com/IOException-while-parsing-tp4123696p4123739.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: IOException while parsing
Posted by anupamk <an...@usc.edu>.
Hi John,
Thanks for the hint. I checked hadoop.log and upon further investigation the
only suspicious entry I found was the following warning --
Can this be the cause of the IOException ?
If so then what may the remedy be ?
--
View this message in context: http://lucene.472066.n3.nabble.com/IOException-while-parsing-tp4123696p4123720.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: IOException while parsing
Posted by John Lafitte <jl...@brandextract.com>.
Hi,
That looks like the console output, but have you looked in logs/hadoop.log
? usually you will get more detail on your error from there including a
stack trace.
On Fri, Mar 14, 2014 at 1:37 PM, anupamk <an...@usc.edu> wrote:
> Hi,
>
> I am fetching around 3000 links. I am able to fetch them successfully. But
> when I try to parse them I get IOException as follows:
>
>
>
>
>
> This is not at helpful to troubleshoot the problem.
>
> Has anyone else run into such problem while parsing ?
>
>
> I am using Nutch-1.7
>
>
> I am guessing it's because nutch tries to parse a truncated mp3 file and
> fails ? I am right ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/IOException-while-parsing-tp4123696.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>