You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Madhulika Mitruka <ma...@gmail.com> on 2016/09/02 10:47:51 UTC

Nutch 2.3.1 with Solr 4.10.3 as Gora Backend | Failing

Hi,

I am trying to setup Nutch 2.3.1 with Solr 4.10.3 as apache gora backend. I
am running Nutch in Local mode.
I have installed Nutch and Solr and when I run Nutch with seed urls list I
get the following exception :

--------------*** -------------------
2016-09-02 03:19:55,057 WARN  mapreduce.GoraRecordWriter - Exception at
GoraRecordWriter.class while writing to datastore: Not in union
["null","string"]: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
2016-09-02 03:19:55,058 WARN  mapreduce.GoraRecordWriter - Trace:
org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
2016-09-02 03:19:55,060 WARN  mapred.LocalJobRunner -
job_local2021213666_0001
java.lang.Exception: java.lang.RuntimeException:
org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]:
java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException:
org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]:
java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]
        at
org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:76)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)

---------*** ---------------

I assume this is happening because the data type being passed is a
ByteBuffer whereas it should either be "null" or "string". Has someone seen
similar issue ?

My setup details quickly :

NUTCH :

1) Nutch is running from /nutch/apache-nutch-2.3.1/runtime/local/
2) I have updated
/nutch/apache-nutch-2.3.1/runtime/local/conf/gora.properties  :

gora.datastore.default=org.apache.gora.solr.store.SolrStore
gora.solrstore.solr.url=http://localhost:8983/solr
gora.solrstore.solr.config=solrconfig.xml
gora.solrstore.solr.schema=gora-solr-webpage-schema.xml  (This is based on
suggestion :
http://grokbase.com/t/nutch/user/164n8srqjy/solr-as-backend-in-nutch-2-3-1 )
gora.solrstore.solr.batchSize=100
gora.solrstore.solr.solrjserver=http
gora.solrstore.solr.commitWithin=1000
gora.solrstore.solr.resultsSize=100

SOLR :

1) My Solr deployment has a core called 'webpage'
: /solr/solr-4.10.3/example/solr/webpage
2) I have *copied
*/nutch/apache-nutch-2.3.1/runtime/local/conf/*gora-solr-webpage-schema.xml
as 'schema.xml'* to the conf folder of this core .
[This was after I was getting similar exception by using schema.xml that
comes with nutch distribution and after reading it here :
http://grokbase.com/t/nutch/user/164n8srqjy/solr-as-backend-in-nutch-2-3-1 ]

Can someone please give pointers on the mistake I am making.

Thanks
Madhulika