You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Weber <mi...@geminio.de> on 2016/02/15 20:39:29 UTC

Error fetching with nutch2.3.1 & cassandra: supercolumn parameter is not optional for super CF sc

Hello List,

i'm new to nutch and i'm trying to run nutch with cassandra.

To this point i've set it up so far that i can inject a seed list and
generate a list of URLs to crawl next.
The Database Tables  webpage.f and webpage.sc have content.

But if i try to fetch these URLs it ends up with the following error:
016-02-15 18:39:43,817 INFO  fetcher.FetcherJob - -activeThreads=0
2016-02-15 18:39:43,893 WARN  mapreduce.GoraRecordWriter - Exception at
GoraRecordWriter.class while closing datastore:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
2016-02-15 18:39:43,893 WARN  mapreduce.GoraRecordWriter - Trace:
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
2016-02-15 18:39:43,895 WARN  mapred.LocalJobRunner -
job_local421637962_0001
java.lang.Exception: java.lang.RuntimeException:
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException:
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
        at
org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:60)
        at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
        at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
        at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260)
        at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
        at
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
        at
me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:69)
        at
org.apache.gora.cassandra.store.HectorUtils.insertColumn(HectorUtils.java:46)
        at
org.apache.gora.cassandra.store.CassandraClient.addColumn(CassandraClient.java:293)
        at
org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:513)
        at
org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:599)
        at
org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:316)
        at
org.apache.gora.cassandra.store.CassandraStore.close(CassandraStore.java:160)
        at
org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:56)
        ... 9 more
Caused by: InvalidRequestException(why:supercolumn parameter is not
optional for super CF sc)
        at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result$batch_mutate_resultStandardScheme.read(Cassandra.java:28082)
        at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result$batch_mutate_resultStandardScheme.read(Cassandra.java:28068)
        at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:28002)
        at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1060)
        at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1046)
        at
me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
        at
me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
        at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104)
        at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:253)
        ... 19 more


I've founf this Ticket at GORA Jira
https://issues.apache.org/jira/browse/GORA-416
but i don't know whats the resolution. A Update to gora-7-snapshot results
in a compiling error when i'm doing ant runtime.

Have used the last release and also the code checkout from svn.

Has anybody an advice for me how i can get nutch run with cassandra?

Thanks!

Micha