You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Koch Martina <Ko...@huberverlag.de> on 2009/02/10 15:47:01 UTC

"old" crawldb not readable with current trunk

Hi,

I just upgraded from trunk version 28.12.2008 to trunk version 04.02.2009.
Now, I'm trying to read my old crawldb's e.g. by using the command "bin/nutch readdb <crawldb> -stats" , but I always get the following error:

2009-02-10 15:41:05,541 DEBUG mapred.MapTask - Writing local split to /tmp/CRAWLNAME.default.xyz/mapred/local/localRunner/split.dta
2009-02-10 15:41:05,588 DEBUG mapred.TaskRunner - attempt_local_0001_m_000000_0 Progress/ping thread started
2009-02-10 15:41:05,588 INFO  mapred.MapTask - numReduceTasks: 1
2009-02-10 15:41:05,588 INFO  mapred.MapTask - io.sort.mb = 100
2009-02-10 15:41:05,698 INFO  mapred.MapTask - data buffer = 79691776/99614720
2009-02-10 15:41:05,698 INFO  mapred.MapTask - record buffer = 262144/327680
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Creating group org.apache.hadoop.mapred.Task$Counter with bundle
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_INPUT_RECORDS
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_OUTPUT_RECORDS
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS
2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES
2009-02-10 15:41:05,729 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: java.lang.NullPointerException
                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
                at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164)
                at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262)
                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
                at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
                at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
                at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
                at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
                at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
                at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
Caused by: java.lang.NullPointerException
                at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73)
                ... 13 more

With the older version oft he trunk I can read the crawldb without difficulty.

Are the old files not readable with the new trunk version since the upgrade to lucene 2.4?
Is there anything I can do to re-use my old data with the new version?

Kind regards,
Martina

Re: "old" crawldb not readable with current trunk

Posted by Doğacan Güney <do...@gmail.com>.

Hi Koch,

Sorry, I thought that would have fixed your problem.

How big is your crawldb? If it is small, would you mind sending it to
me so I can have a look?

On Wed, Feb 11, 2009 at 10:24 AM, Koch Martina <Ko...@huberverlag.de> wrote:
> Hi Doğacan,
>
> thanks for your reply!
>
> I applied the patch, but I still get the same error message.
> I also tried to merge the old crawldb in a new one and then to a readdb, but even the merge step fails with the following error message:
>
> 2009-02-11 08:35:31,520 INFO  jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2009-02-11 08:35:31,707 INFO  mapred.FileInputFormat - Total input paths to process : 1
> 2009-02-11 08:35:32,004 INFO  mapred.JobClient - Running job: job_local_0001
> 2009-02-11 08:35:32,004 INFO  mapred.FileInputFormat - Total input paths to process : 1
> 2009-02-11 08:35:32,082 INFO  mapred.MapTask - numReduceTasks: 1
> 2009-02-11 08:35:32,082 INFO  mapred.MapTask - io.sort.mb = 100
> 2009-02-11 08:35:32,191 INFO  mapred.MapTask - data buffer = 79691776/99614720
> 2009-02-11 08:35:32,191 INFO  mapred.MapTask - record buffer = 262144/327680
> 2009-02-11 08:35:32,222 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: java.lang.NullPointerException
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
>        at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164)
>        at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262)
>        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>        at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
>        at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
>        at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>        at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
>        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Caused by: java.lang.NullPointerException
>        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73)
>        ... 13 more
> 2009-02-11 08:35:33,003 FATAL crawl.CrawlDbMerger - CrawlDb merge: java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
>        at org.apache.nutch.crawl.CrawlDbMerger.merge(CrawlDbMerger.java:119)
>        at org.apache.nutch.crawl.CrawlDbMerger.run(CrawlDbMerger.java:178)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.crawl.CrawlDbMerger.main(CrawlDbMerger.java:150)
>
> I ran the merge step in debug mode and saw that the new code lines of CrawlDbMerger are never read. The error occurs earlier somewhere in the merge method.
>
> Kind regards,
> Martina
>
>
> -----Ursprüngliche Nachricht-----
> Von: Doğacan Güney [mailto:dogacan@gmail.com]
> Gesendet: Dienstag, 10. Februar 2009 22:54
> An: nutch-user@lucene.apache.org
> Betreff: Re: "old" crawldb not readable with current trunk
>
> On Tue, Feb 10, 2009 at 4:47 PM, Koch Martina <Ko...@huberverlag.de> wrote:
>> Hi,
>>
>> I just upgraded from trunk version 28.12.2008 to trunk version 04.02.2009.
>> Now, I'm trying to read my old crawldb's e.g. by using the command "bin/nutch readdb <crawldb> -stats" , but I always get the following error:
>>
>> 2009-02-10 15:41:05,541 DEBUG mapred.MapTask - Writing local split to /tmp/CRAWLNAME.default.xyz/mapred/local/localRunner/split.dta
>> 2009-02-10 15:41:05,588 DEBUG mapred.TaskRunner - attempt_local_0001_m_000000_0 Progress/ping thread started
>> 2009-02-10 15:41:05,588 INFO  mapred.MapTask - numReduceTasks: 1
>> 2009-02-10 15:41:05,588 INFO  mapred.MapTask - io.sort.mb = 100
>> 2009-02-10 15:41:05,698 INFO  mapred.MapTask - data buffer = 79691776/99614720
>> 2009-02-10 15:41:05,698 INFO  mapred.MapTask - record buffer = 262144/327680
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Creating group org.apache.hadoop.mapred.Task$Counter with bundle
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_INPUT_RECORDS
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_OUTPUT_RECORDS
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS
>> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES
>> 2009-02-10 15:41:05,729 WARN  mapred.LocalJobRunner - job_local_0001
>> java.lang.RuntimeException: java.lang.NullPointerException
>>                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
>>                at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164)
>>                at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262)
>>                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>                at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
>>                at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
>>                at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>>                at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>>                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
>>                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
>>                at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>>                at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>> Caused by: java.lang.NullPointerException
>>                at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>>                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73)
>>                ... 13 more
>>
>> With the older version oft he trunk I can read the crawldb without difficulty.
>>
>> Are the old files not readable with the new trunk version since the upgrade to lucene 2.4?
>> Is there anything I can do to re-use my old data with the new version?
>>
>
> Try again in a couple of days. This is a known bug (NUTCH-683). I will
> commit that patch very
> soon. Meanwhile, you can apply patch there manually.
>
>> Kind regards,
>> Martina
>>
>
>
>
> --
> Doğacan Güney
>



-- 
Doğacan Güney

AW: "old" crawldb not readable with current trunk

Posted by Koch Martina <Ko...@huberverlag.de>.

Hi Doğacan,

thanks for your reply!

I applied the patch, but I still get the same error message. 
I also tried to merge the old crawldb in a new one and then to a readdb, but even the merge step fails with the following error message:

2009-02-11 08:35:31,520 INFO  jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2009-02-11 08:35:31,707 INFO  mapred.FileInputFormat - Total input paths to process : 1
2009-02-11 08:35:32,004 INFO  mapred.JobClient - Running job: job_local_0001
2009-02-11 08:35:32,004 INFO  mapred.FileInputFormat - Total input paths to process : 1
2009-02-11 08:35:32,082 INFO  mapred.MapTask - numReduceTasks: 1
2009-02-11 08:35:32,082 INFO  mapred.MapTask - io.sort.mb = 100
2009-02-11 08:35:32,191 INFO  mapred.MapTask - data buffer = 79691776/99614720
2009-02-11 08:35:32,191 INFO  mapred.MapTask - record buffer = 262144/327680
2009-02-11 08:35:32,222 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: java.lang.NullPointerException
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
	at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164)
	at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
	at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
Caused by: java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73)
	... 13 more
2009-02-11 08:35:33,003 FATAL crawl.CrawlDbMerger - CrawlDb merge: java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
	at org.apache.nutch.crawl.CrawlDbMerger.merge(CrawlDbMerger.java:119)
	at org.apache.nutch.crawl.CrawlDbMerger.run(CrawlDbMerger.java:178)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.crawl.CrawlDbMerger.main(CrawlDbMerger.java:150)

I ran the merge step in debug mode and saw that the new code lines of CrawlDbMerger are never read. The error occurs earlier somewhere in the merge method.

Kind regards,
Martina


-----Ursprüngliche Nachricht-----
Von: Doğacan Güney [mailto:dogacan@gmail.com] 
Gesendet: Dienstag, 10. Februar 2009 22:54
An: nutch-user@lucene.apache.org
Betreff: Re: "old" crawldb not readable with current trunk

On Tue, Feb 10, 2009 at 4:47 PM, Koch Martina <Ko...@huberverlag.de> wrote:
> Hi,
>
> I just upgraded from trunk version 28.12.2008 to trunk version 04.02.2009.
> Now, I'm trying to read my old crawldb's e.g. by using the command "bin/nutch readdb <crawldb> -stats" , but I always get the following error:
>
> 2009-02-10 15:41:05,541 DEBUG mapred.MapTask - Writing local split to /tmp/CRAWLNAME.default.xyz/mapred/local/localRunner/split.dta
> 2009-02-10 15:41:05,588 DEBUG mapred.TaskRunner - attempt_local_0001_m_000000_0 Progress/ping thread started
> 2009-02-10 15:41:05,588 INFO  mapred.MapTask - numReduceTasks: 1
> 2009-02-10 15:41:05,588 INFO  mapred.MapTask - io.sort.mb = 100
> 2009-02-10 15:41:05,698 INFO  mapred.MapTask - data buffer = 79691776/99614720
> 2009-02-10 15:41:05,698 INFO  mapred.MapTask - record buffer = 262144/327680
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Creating group org.apache.hadoop.mapred.Task$Counter with bundle
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_INPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_OUTPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES
> 2009-02-10 15:41:05,729 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: java.lang.NullPointerException
>                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
>                at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164)
>                at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262)
>                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>                at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
>                at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
>                at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>                at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
>                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
>                at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>                at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Caused by: java.lang.NullPointerException
>                at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73)
>                ... 13 more
>
> With the older version oft he trunk I can read the crawldb without difficulty.
>
> Are the old files not readable with the new trunk version since the upgrade to lucene 2.4?
> Is there anything I can do to re-use my old data with the new version?
>

Try again in a couple of days. This is a known bug (NUTCH-683). I will
commit that patch very
soon. Meanwhile, you can apply patch there manually.

> Kind regards,
> Martina
>



-- 
Doğacan Güney

Re: "old" crawldb not readable with current trunk

Posted by Doğacan Güney <do...@gmail.com>.

On Tue, Feb 10, 2009 at 4:47 PM, Koch Martina <Ko...@huberverlag.de> wrote:
> Hi,
>
> I just upgraded from trunk version 28.12.2008 to trunk version 04.02.2009.
> Now, I'm trying to read my old crawldb's e.g. by using the command "bin/nutch readdb <crawldb> -stats" , but I always get the following error:
>
> 2009-02-10 15:41:05,541 DEBUG mapred.MapTask - Writing local split to /tmp/CRAWLNAME.default.xyz/mapred/local/localRunner/split.dta
> 2009-02-10 15:41:05,588 DEBUG mapred.TaskRunner - attempt_local_0001_m_000000_0 Progress/ping thread started
> 2009-02-10 15:41:05,588 INFO  mapred.MapTask - numReduceTasks: 1
> 2009-02-10 15:41:05,588 INFO  mapred.MapTask - io.sort.mb = 100
> 2009-02-10 15:41:05,698 INFO  mapred.MapTask - data buffer = 79691776/99614720
> 2009-02-10 15:41:05,698 INFO  mapred.MapTask - record buffer = 262144/327680
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Creating group org.apache.hadoop.mapred.Task$Counter with bundle
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_INPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_OUTPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS
> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES
> 2009-02-10 15:41:05,729 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: java.lang.NullPointerException
>                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
>                at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164)
>                at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262)
>                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>                at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>                at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
>                at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
>                at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
>                at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
>                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186)
>                at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170)
>                at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>                at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Caused by: java.lang.NullPointerException
>                at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73)
>                ... 13 more
>
> With the older version oft he trunk I can read the crawldb without difficulty.
>
> Are the old files not readable with the new trunk version since the upgrade to lucene 2.4?
> Is there anything I can do to re-use my old data with the new version?
>

Try again in a couple of days. This is a known bug (NUTCH-683). I will
commit that patch very
soon. Meanwhile, you can apply patch there manually.

> Kind regards,
> Martina
>



-- 
Doğacan Güney