You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rafael Pappert <rp...@fwpsystems.com> on 2011/11/30 23:31:53 UTC

Solr Indexing Problem

Hello List,

I try to index my parsed into solr with the solrindex command,
but on one segment I got the following exception:

java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)

java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)

java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)

java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:63)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)

Whats wrong with this segment? All of the other commands worked without any problem.

Best regards,
Rafael.




Re: Solr Indexing Problem

Posted by Rafael Pappert <rp...@fwpsystems.com>.
Hello,

do I have to point the "segment parameter" of
the readseg command to the "timestamp directory"?

hadoop dfs -ls crawl/segments/20111130041413/
Found 7 items
-rw-r--r--   2 fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/_SUCCESS
drwxr-xr-x   - fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/content
drwxr-xr-x   - fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/crawl_fetch
drwxr-xr-x   - fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/crawl_generate
drwxr-xr-x   - fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/crawl_parse
drwxr-xr-x   - fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/parse_data
drwxr-xr-x   - fwp supergroup          0 2011-11-30 15:55 /user/fwp/crawl/segments/20111130041413/parse_text


like this?

nutch readseg -list crawl/segments/20111130041413/

I got still the same exception for every segment.

Exception in thread "main" java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:93)
	at org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:455)
	at org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:433)
	at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:579)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

But the solrindex command works for all the segments, except one. 

I'm not sure about the usage of the segment reader.

greetz,
Rafael.

-
On 1/Dec/ 2011, at 11:02 , Markus Jelsma wrote:

> Seems corrupt or empty. Use segment reader to check out the segment.
> 
> On Wednesday 30 November 2011 23:31:53 Rafael Pappert wrote:
>> Hello List,
>> 
>> I try to index my parsed into solr with the solrindex command,
>> but on one segment I got the following exception:
>> 
>> java.io.EOFException
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>> 	at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
>> dReader.java:43) at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
>> ileInputFormat.java:63) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
>> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> 
>> java.io.EOFException
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>> 	at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
>> dReader.java:43) at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
>> ileInputFormat.java:63) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
>> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> 
>> java.io.EOFException
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>> 	at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
>> dReader.java:43) at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
>> ileInputFormat.java:63) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
>> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> 
>> java.io.EOFException
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
>> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>> 	at
>> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
>> dReader.java:43) at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
>> ileInputFormat.java:63) at
>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
>> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> 
>> Whats wrong with this segment? All of the other commands worked without any
>> problem.
>> 
>> Best regards,
>> Rafael.
> 
> -- 
> Markus Jelsma - CTO - Openindex


Re: Solr Indexing Problem

Posted by Markus Jelsma <ma...@openindex.io>.
Seems corrupt or empty. Use segment reader to check out the segment.

On Wednesday 30 November 2011 23:31:53 Rafael Pappert wrote:
> Hello List,
> 
> I try to index my parsed into solr with the solrindex command,
> but on one segment I got the following exception:
> 
> java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> 	at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
> dReader.java:43) at
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
> ileInputFormat.java:63) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
> 
> java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> 	at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
> dReader.java:43) at
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
> ileInputFormat.java:63) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
> 
> java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> 	at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
> dReader.java:43) at
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
> ileInputFormat.java:63) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
> 
> java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:180)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> 	at
> org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecor
> dReader.java:43) at
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceF
> ileInputFormat.java:63) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:1
> 97) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
> java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253)
> 
> Whats wrong with this segment? All of the other commands worked without any
> problem.
> 
> Best regards,
> Rafael.

-- 
Markus Jelsma - CTO - Openindex