You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Carl Cerecke <ca...@nzs.com> on 2007/07/25 01:52:32 UTC
NullPointerException fetching some sites with temp redirects
Hi,
Using nutch 0.9, although I get the same with a more recent nightly build.
I'm getting NPE fetching these two pages:
http://www.absoluteit.co.nz
and
http://defence.allmedia.co.nz
I've tracked it down by putting a t.printStackTrace() in the catch
(Throwable t) of the run() in Fetcher.java:
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:375)
at org.apache.hadoop.io.Text.encode(Text.java:356)
at org.apache.hadoop.io.Text.writeString(Text.java:396)
at
org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
at
org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
at
org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
I'm not sure where to go from here. Any suggestions?
Cheers,
Carl.
Re: NullPointerException fetching some sites with temp redirects
Posted by Carl Cerecke <ca...@nzs.com>.
Is anybody else getting NullPointerExceptions fetching either of these
two sites (0.90 and latest from trunk) ?
http://www.absoluteit.co.nz
http://defence.allmedia.co.nz
I am, but would be grateful if someone else could test whether they work
or not so I can eliminate nutch configuration issues.
Cheers,
Carl.
Carl Cerecke wrote:
> Hi,
>
> Using nutch 0.9, although I get the same with a more recent nightly build.
>
> I'm getting NPE fetching these two pages:
>
> http://www.absoluteit.co.nz
> and
> http://defence.allmedia.co.nz
>
> I've tracked it down by putting a t.printStackTrace() in the catch
> (Throwable t) of the run() in Fetcher.java:
> java.lang.NullPointerException
> at org.apache.hadoop.io.Text.encode(Text.java:375)
> at org.apache.hadoop.io.Text.encode(Text.java:356)
> at org.apache.hadoop.io.Text.writeString(Text.java:396)
> at
> org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
> at
> org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
> at
> org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
>
> I'm not sure where to go from here. Any suggestions?
>
> Cheers,
> Carl.
>
> _____________________________________________________________________
>
> This has been cleaned & processed by www.rocketspam.co.nz
> _____________________________________________________________________
>
Re: NullPointerException fetching some sites with temp redirects
Posted by Carl Cerecke <ca...@nzs.com>.
Hi, Included Content.java. Will retry with latest trunk shortly.
Content.java:137-149
137 protected final void writeCompressed(DataOutput out) throws
IOException {
138 out.writeByte(VERSION);
139
140 Text.writeString(out, url); // write url
141 Text.writeString(out, base); // write base
142
143 out.writeInt(content.length); // write content
144 out.write(content);
145
146 Text.writeString(out, contentType); // write contentType
147
148 metadata.write(out); // write metadata
149 }
I also noticed in the output.collect call in Fetcher.java a new
FetcherOutput is created with the third argument (ParseImpl) as null
even though the Content argument is not null (it is the contents of the
page that is redirected to).
Cheers,
Carl.
Doğacan Güney wrote:
> Hi,
>
> On 7/25/07, Carl Cerecke <ca...@nzs.com> wrote:
>> Hi,
>>
>> Using nutch 0.9, although I get the same with a more recent nightly
>> build.
>>
>> I'm getting NPE fetching these two pages:
>>
>> http://www.absoluteit.co.nz
>> and
>> http://defence.allmedia.co.nz
>>
>> I've tracked it down by putting a t.printStackTrace() in the catch
>> (Throwable t) of the run() in Fetcher.java:
>> java.lang.NullPointerException
>> at org.apache.hadoop.io.Text.encode(Text.java:375)
>> at org.apache.hadoop.io.Text.encode(Text.java:356)
>> at org.apache.hadoop.io.Text.writeString(Text.java:396)
>> at
>> org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
>> at
>> org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
>> at
>> org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
>> at
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
>>
>> I'm not sure where to go from here. Any suggestions?
>
> Can you retry with the latest trunk? Not that I think it will solve
> your problem but Content.java has changed recently so I am not sure
> what was in line 146. So, if problem reoccurs with latest trunk I can
> check exactly which line is failing. Alternatively, you can send that
> part of Content.java's code.
>
>>
>> Cheers,
>> Carl.
>>
>
>
Re: NullPointerException fetching some sites with temp redirects
Posted by Carl Cerecke <ca...@nzs.com>.
Hi Doğacan,
Yes, I get the NullPointerException with the latest trunk, too.
Cheers,
Carl.
Doğacan Güney wrote:
> Hi,
>
> On 7/25/07, Carl Cerecke <ca...@nzs.com> wrote:
>> Hi,
>>
>> Using nutch 0.9, although I get the same with a more recent nightly
>> build.
>>
>> I'm getting NPE fetching these two pages:
>>
>> http://www.absoluteit.co.nz
>> and
>> http://defence.allmedia.co.nz
>>
>> I've tracked it down by putting a t.printStackTrace() in the catch
>> (Throwable t) of the run() in Fetcher.java:
>> java.lang.NullPointerException
>> at org.apache.hadoop.io.Text.encode(Text.java:375)
>> at org.apache.hadoop.io.Text.encode(Text.java:356)
>> at org.apache.hadoop.io.Text.writeString(Text.java:396)
>> at
>> org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
>> at
>> org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
>> at
>> org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
>> at
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
>>
>> I'm not sure where to go from here. Any suggestions?
>
> Can you retry with the latest trunk? Not that I think it will solve
> your problem but Content.java has changed recently so I am not sure
> what was in line 146. So, if problem reoccurs with latest trunk I can
> check exactly which line is failing. Alternatively, you can send that
> part of Content.java's code.
>
>>
>> Cheers,
>> Carl.
>>
>
>
Re: NullPointerException fetching some sites with temp redirects
Posted by Doğacan Güney <do...@gmail.com>.
Hi,
On 7/25/07, Carl Cerecke <ca...@nzs.com> wrote:
> Hi,
>
> Using nutch 0.9, although I get the same with a more recent nightly build.
>
> I'm getting NPE fetching these two pages:
>
> http://www.absoluteit.co.nz
> and
> http://defence.allmedia.co.nz
>
> I've tracked it down by putting a t.printStackTrace() in the catch
> (Throwable t) of the run() in Fetcher.java:
> java.lang.NullPointerException
> at org.apache.hadoop.io.Text.encode(Text.java:375)
> at org.apache.hadoop.io.Text.encode(Text.java:356)
> at org.apache.hadoop.io.Text.writeString(Text.java:396)
> at
> org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
> at
> org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
> at
> org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
>
> I'm not sure where to go from here. Any suggestions?
Can you retry with the latest trunk? Not that I think it will solve
your problem but Content.java has changed recently so I am not sure
what was in line 146. So, if problem reoccurs with latest trunk I can
check exactly which line is failing. Alternatively, you can send that
part of Content.java's code.
>
> Cheers,
> Carl.
>
--
Doğacan Güney