You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Madhulika Mitruka <ma...@gmail.com> on 2016/09/02 10:47:51 UTC
Nutch 2.3.1 with Solr 4.10.3 as Gora Backend | Failing
Hi,
I am trying to setup Nutch 2.3.1 with Solr 4.10.3 as apache gora backend. I
am running Nutch in Local mode.
I have installed Nutch and Solr and when I run Nutch with seed urls list I
get the following exception :
--------------*** -------------------
2016-09-02 03:19:55,057 WARN mapreduce.GoraRecordWriter - Exception at
GoraRecordWriter.class while writing to datastore: Not in union
["null","string"]: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
2016-09-02 03:19:55,058 WARN mapreduce.GoraRecordWriter - Trace:
org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
2016-09-02 03:19:55,060 WARN mapred.LocalJobRunner -
job_local2021213666_0001
java.lang.Exception: java.lang.RuntimeException:
org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]:
java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException:
org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]:
java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:76)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
---------*** ---------------
I assume this is happening because the data type being passed is a
ByteBuffer whereas it should either be "null" or "string". Has someone seen
similar issue ?
My setup details quickly :
NUTCH :
1) Nutch is running from /nutch/apache-nutch-2.3.1/runtime/local/
2) I have updated
/nutch/apache-nutch-2.3.1/runtime/local/conf/gora.properties :
gora.datastore.default=org.apache.gora.solr.store.SolrStore
gora.solrstore.solr.url=http://localhost:8983/solr
gora.solrstore.solr.config=solrconfig.xml
gora.solrstore.solr.schema=gora-solr-webpage-schema.xml (This is based on
suggestion :
http://grokbase.com/t/nutch/user/164n8srqjy/solr-as-backend-in-nutch-2-3-1 )
gora.solrstore.solr.batchSize=100
gora.solrstore.solr.solrjserver=http
gora.solrstore.solr.commitWithin=1000
gora.solrstore.solr.resultsSize=100
SOLR :
1) My Solr deployment has a core called 'webpage'
: /solr/solr-4.10.3/example/solr/webpage
2) I have *copied
*/nutch/apache-nutch-2.3.1/runtime/local/conf/*gora-solr-webpage-schema.xml
as 'schema.xml'* to the conf folder of this core .
[This was after I was getting similar exception by using schema.xml that
comes with nutch distribution and after reading it here :
http://grokbase.com/t/nutch/user/164n8srqjy/solr-as-backend-in-nutch-2-3-1 ]
Can someone please give pointers on the mistake I am making.
Thanks
Madhulika