You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jeroenvlek <jv...@datamantics.com> on 2015/03/03 10:40:48 UTC

Re: Nutch 2 with Cassandra as a storage is not crawling data properly

Hello,

I'm having the same issue. The urls are added as keys and the fetch time,
fetch interval, batch id and score are written, but nothing else. I don't
know much about mailing list etiquette, but here's my question on SO:

http://stackoverflow.com/questions/28813709/how-to-extract-nutch-2-3-data-from-cassandra-with-gora

And my chat with Alfonso Nishikawa who says it might be a Gora issue:

http://chat.stackoverflow.com/rooms/72077/discussion-between-alfonso-nishikawa-and-jeroen-vlek

In the hadoop.log file a warning is logged that might provide a clue: 

WARN mapreduce.GoraRecordWriter - Exception at GoraRecordWriter.class while
closing datastore.InvalidRequestException(why:supercolumn parameter is not
optional for super CF sc)

Hope this helps.

Cheers,
Jeroen Vlek



--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-with-Cassandra-as-a-storage-is-not-crawling-data-properly-tp4188115p4190623.html
Sent from the Nutch - User mailing list archive at Nabble.com.