You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Leo Subscriptions <ll...@zudiewiener.com> on 2011/07/13 05:28:08 UTC

nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not
built) and tomcat6 following this (and some other) links
http://wiki.apache.org/nutch/RunningNutchAndSolr

I have added the nutch schema and can access/view this schema via the
admin page. nutch also works as I can perfrom successful searches.

When I execute the following:

>> ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb
crawl/linkdb crawl/segments/*

I (eventually) get an io error. 

Tha above command creates the following
files /var/lib/tomcat6/solr/core0/data/index/

-------------------------------
544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt
  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 _1.fdx
  4 -rw-r--r-- 1 tomcat6 tomcat6     32 2011-07-13 10:59 segments_2
  4 -rw-r--r-- 1 tomcat6 tomcat6     20 2011-07-13 10:59 segments.gen
  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 write.lock
-------------------------------

but the hadoop.log reports the following error

---------------------------
2011-07-13 11:09:47,665 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-07-13 11:09:47,666 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: content
dest: content
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: site
dest: site
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: title
dest: title
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: host
dest: host
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: segment
dest: segment
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: boost
dest: boost
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: digest
dest: digest
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: tstamp
dest: tstamp
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
id
2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
url
2011-07-13 11:09:49,272 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: Invalid version or the data in not in
'javabin' format
        at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
        at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
        at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at
org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
        at org.apache.nutch.indexer.IndexerOutputFormat
$1.write(IndexerOutputFormat.java:54)
        at org.apache.nutch.indexer.IndexerOutputFormat
$1.write(IndexerOutputFormat.java:44)
        at org.apache.hadoop.mapred.ReduceTask
$3.collect(ReduceTask.java:440)
        at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159)
        at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
        at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at org.apache.hadoop.mapred.LocalJobRunner
$Job.run(LocalJobRunner.java:216)
2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException:
Job failed!
-----------------------------------------------------------------------------------------------------------------------------------------------

I'd appreciate any help with this.

Thanks,

Leo




Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

Posted by Markus Jelsma <ma...@openindex.io>.
If you're using Solr anyway, you'd better upgrade to Nutch 1.3 with Solr 3.x 
support.

> Works like a charm.
> 
> Thanks,
> 
> Leo
> 
> On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote:
> > you need to update the solrj libs to 3.x version. the java bin format
> > has changed .
> > I made the change a few months back, you can pull the changes from
> > https://github.com/geek4377/nutch/tree/geek5377-1.2.1
> > 
> > hope that helps,
> > 
> > 
> > On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions
> > 
> > <ll...@zudiewiener.com> wrote:
> > > I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not
> > > built) and tomcat6 following this (and some other) links
> > > http://wiki.apache.org/nutch/RunningNutchAndSolr
> > > 
> > > I have added the nutch schema and can access/view this schema via the
> > > admin page. nutch also works as I can perfrom successful searches.
> > > 
> > > When I execute the following:
> > >>> ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb
> > > 
> > > crawl/linkdb crawl/segments/*
> > > 
> > > I (eventually) get an io error.
> > > 
> > > Tha above command creates the following
> > > files /var/lib/tomcat6/solr/core0/data/index/
> > > 
> > > -------------------------------
> > > 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt
> > > 
> > >  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 _1.fdx
> > >  4 -rw-r--r-- 1 tomcat6 tomcat6     32 2011-07-13 10:59 segments_2
> > >  4 -rw-r--r-- 1 tomcat6 tomcat6     20 2011-07-13 10:59 segments.gen
> > >  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 write.lock
> > > 
> > > -------------------------------
> > > 
> > > but the hadoop.log reports the following error
> > > 
> > > ---------------------------
> > > 2011-07-13 11:09:47,665 INFO  indexer.IndexingFilters - Adding
> > > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > > 2011-07-13 11:09:47,666 INFO  indexer.IndexingFilters - Adding
> > > org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: content
> > > dest: content
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: site
> > > dest: site
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: title
> > > dest: title
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: host
> > > dest: host
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: segment
> > > dest: segment
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: boost
> > > dest: boost
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: digest
> > > dest: digest
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: tstamp
> > > dest: tstamp
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url
> > > dest: id
> > > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url
> > > dest: url
> > > 2011-07-13 11:09:49,272 WARN  mapred.LocalJobRunner - job_local_0001
> > > java.lang.RuntimeException: Invalid version or the data in not in
> > > 'javabin' format
> > > 
> > >        at
> > > 
> > > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99
> > > )
> > > 
> > >        at
> > > 
> > > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(
> > > BinaryResponseParser.java:39)
> > > 
> > >        at
> > > 
> > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commons
> > > HttpSolrServer.java:466)
> > > 
> > >        at
> > > 
> > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Commons
> > > HttpSolrServer.java:243)
> > > 
> > >        at
> > > 
> > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abst
> > > ractUpdateRequest.java:105)
> > > 
> > >        at
> > > 
> > > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> > > 
> > >        at
> > > 
> > > org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
> > > 
> > >        at org.apache.nutch.indexer.IndexerOutputFormat
> > > 
> > > $1.write(IndexerOutputFormat.java:54)
> > > 
> > >        at org.apache.nutch.indexer.IndexerOutputFormat
> > > 
> > > $1.write(IndexerOutputFormat.java:44)
> > > 
> > >        at org.apache.hadoop.mapred.ReduceTask
> > > 
> > > $3.collect(ReduceTask.java:440)
> > > 
> > >        at
> > > 
> > > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:
> > > 159)
> > > 
> > >        at
> > > 
> > > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:
> > > 50)
> > > 
> > >        at
> > > 
> > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> > > 
> > >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > >        at org.apache.hadoop.mapred.LocalJobRunner
> > > 
> > > $Job.run(LocalJobRunner.java:216)
> > > 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException:
> > > Job failed!
> > > -----------------------------------------------------------------------
> > > -----------------------------------------------------------------------
> > > -
> > > 
> > > I'd appreciate any help with this.
> > > 
> > > Thanks,
> > > 
> > > Leo

Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

Posted by Leo Subscriptions <ll...@zudiewiener.com>.
Works like a charm.

Thanks,

Leo

On Wed, 2011-07-13 at 11:31 +0530, Geek Gamer wrote:

> you need to update the solrj libs to 3.x version. the java bin format
> has changed .
> I made the change a few months back, you can pull the changes from
> https://github.com/geek4377/nutch/tree/geek5377-1.2.1
> 
> hope that helps,
> 
> 
> On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions
> <ll...@zudiewiener.com> wrote:
> > I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not
> > built) and tomcat6 following this (and some other) links
> > http://wiki.apache.org/nutch/RunningNutchAndSolr
> >
> > I have added the nutch schema and can access/view this schema via the
> > admin page. nutch also works as I can perfrom successful searches.
> >
> > When I execute the following:
> >
> >>> ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb
> > crawl/linkdb crawl/segments/*
> >
> > I (eventually) get an io error.
> >
> > Tha above command creates the following
> > files /var/lib/tomcat6/solr/core0/data/index/
> >
> > -------------------------------
> > 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt
> >  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 _1.fdx
> >  4 -rw-r--r-- 1 tomcat6 tomcat6     32 2011-07-13 10:59 segments_2
> >  4 -rw-r--r-- 1 tomcat6 tomcat6     20 2011-07-13 10:59 segments.gen
> >  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 write.lock
> > -------------------------------
> >
> > but the hadoop.log reports the following error
> >
> > ---------------------------
> > 2011-07-13 11:09:47,665 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.basic.BasicIndexingFilter
> > 2011-07-13 11:09:47,666 INFO  indexer.IndexingFilters - Adding
> > org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: content
> > dest: content
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: site
> > dest: site
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: title
> > dest: title
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: host
> > dest: host
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: segment
> > dest: segment
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: boost
> > dest: boost
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: digest
> > dest: digest
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: tstamp
> > dest: tstamp
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
> > id
> > 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
> > url
> > 2011-07-13 11:09:49,272 WARN  mapred.LocalJobRunner - job_local_0001
> > java.lang.RuntimeException: Invalid version or the data in not in
> > 'javabin' format
> >        at
> > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
> >        at
> > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
> >        at
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466)
> >        at
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
> >        at
> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >        at
> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> >        at
> > org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
> >        at org.apache.nutch.indexer.IndexerOutputFormat
> > $1.write(IndexerOutputFormat.java:54)
> >        at org.apache.nutch.indexer.IndexerOutputFormat
> > $1.write(IndexerOutputFormat.java:44)
> >        at org.apache.hadoop.mapred.ReduceTask
> > $3.collect(ReduceTask.java:440)
> >        at
> > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159)
> >        at
> > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
> >        at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> >        at org.apache.hadoop.mapred.LocalJobRunner
> > $Job.run(LocalJobRunner.java:216)
> > 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException:
> > Job failed!
> > -----------------------------------------------------------------------------------------------------------------------------------------------
> >
> > I'd appreciate any help with this.
> >
> > Thanks,
> >
> > Leo
> >
> >
> >
> >



Re: nutch 1.2, solr 3.3, tomcat6. java.io.IOException: Job failed! problem when building solrindex

Posted by Geek Gamer <ge...@gmail.com>.
you need to update the solrj libs to 3.x version. the java bin format
has changed .
I made the change a few months back, you can pull the changes from
https://github.com/geek4377/nutch/tree/geek5377-1.2.1

hope that helps,


On Wed, Jul 13, 2011 at 8:58 AM, Leo Subscriptions
<ll...@zudiewiener.com> wrote:
> I'm running 64bit Ubuntu 11.04, nutch 1.2, solr 3.3 (downloaded, not
> built) and tomcat6 following this (and some other) links
> http://wiki.apache.org/nutch/RunningNutchAndSolr
>
> I have added the nutch schema and can access/view this schema via the
> admin page. nutch also works as I can perfrom successful searches.
>
> When I execute the following:
>
>>> ./bin/nutch solrindex http://localhost:8080/solr/core0 crawl/crawldb
> crawl/linkdb crawl/segments/*
>
> I (eventually) get an io error.
>
> Tha above command creates the following
> files /var/lib/tomcat6/solr/core0/data/index/
>
> -------------------------------
> 544 -rw-r--r-- 1 tomcat6 tomcat6 557056 2011-07-13 11:09 _1.fdt
>  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 _1.fdx
>  4 -rw-r--r-- 1 tomcat6 tomcat6     32 2011-07-13 10:59 segments_2
>  4 -rw-r--r-- 1 tomcat6 tomcat6     20 2011-07-13 10:59 segments.gen
>  0 -rw-r--r-- 1 tomcat6 tomcat6      0 2011-07-13 11:00 write.lock
> -------------------------------
>
> but the hadoop.log reports the following error
>
> ---------------------------
> 2011-07-13 11:09:47,665 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2011-07-13 11:09:47,666 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: site
> dest: site
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: title
> dest: title
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: host
> dest: host
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: boost
> dest: boost
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: digest
> dest: digest
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: tstamp
> dest: tstamp
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2011-07-13 11:09:47,690 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2011-07-13 11:09:49,272 WARN  mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: Invalid version or the data in not in
> 'javabin' format
>        at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
>        at
> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466)
>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
>        at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>        at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>        at
> org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64)
>        at org.apache.nutch.indexer.IndexerOutputFormat
> $1.write(IndexerOutputFormat.java:54)
>        at org.apache.nutch.indexer.IndexerOutputFormat
> $1.write(IndexerOutputFormat.java:44)
>        at org.apache.hadoop.mapred.ReduceTask
> $3.collect(ReduceTask.java:440)
>        at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159)
>        at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
>        at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>        at org.apache.hadoop.mapred.LocalJobRunner
> $Job.run(LocalJobRunner.java:216)
> 2011-07-13 11:09:49,611 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> I'd appreciate any help with this.
>
> Thanks,
>
> Leo
>
>
>
>