You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Koen Smets (JIRA)" <ji...@apache.org> on 2014/06/03 12:22:01 UTC
[jira] [Created] (NUTCH-1791) Null pointer exceptions with
gora-cassandra-0.4
Koen Smets created NUTCH-1791:
---------------------------------
Summary: Null pointer exceptions with gora-cassandra-0.4
Key: NUTCH-1791
URL: https://issues.apache.org/jira/browse/NUTCH-1791
Project: Nutch
Issue Type: Bug
Components: generator, storage
Affects Versions: 2.3
Environment: dsc-cassandra-2.0.2, dsc-cassandra-2.0.7
Reporter: Koen Smets
Fix For: 2.3
Latest nutch-2.x source checkout fails to run with Cassandra 2.0.2 (and also Cassandra 2.0.7) as storage backend both in normal Nutch operations (inject, generate, fetch) cycle as in the junit tests {{TestGoraStorage}}
{code}
2014-06-03 11:24:23,495 INFO connection.CassandraHostRetryService (CassandraHostRetryService.java:<init>(48)) - Downed Host Retry service started with queue size -1 and retry delay 10s
2014-06-03 11:24:23,535 INFO service.JmxMonitor (JmxMonitor.java:registerMonitor(52)) - Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector
Exception in thread "main" java.lang.NullPointerException
at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
at org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:93)
at org.apache.nutch.storage.TestGoraStorage.main(TestGoraStorage.java:230)
{code}
After injecting:
{code}
ksmets@precise64 ~/l/a/r/local> ./bin/nutch inject urls
InjectorJob: starting at 2014-06-03 11:55:11
InjectorJob: Injecting urlDir: urls
InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2014-06-03 11:55:13, elapsed: 00:00:02
ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
min score: 1.0
retry 0: 1
jobs: {db_stats-job_local1403358409_0001={jobID=job_local1403358409_0001, jobName=db_stats, counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=97, MAP_INPUT_RECORDS=1, REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=12, MAP_OUTPUT_BYTES=53, COMMITTED_HEAP_BYTES=358612992, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, COMBINE_INPUT_RECORDS=4, REDUCE_INPUT_RECORDS=6, REDUCE_INPUT_GROUPS=6, COMBINE_OUTPUT_RECORDS=6, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=6, VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=4}, FileSystemCounters={FILE_BYTES_READ=974145, FILE_BYTES_WRITTEN=1144369}, File Output Format Counters ={BYTES_WRITTEN=225}}}}
max score: 1.0
TOTAL urls: 1
status 0 (null): 1
avg score: 1.0
WebTable statistics: done
ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
key: http://example.com/
baseUrl: null
status: 0 (null)
fetchTime: 1401789311270
prevFetchTime: 0
fetchInterval: 2592000
retriesSinceFetch: 0
modifiedTime: 0
prevModifiedTime: 0
protocolStatus: (null)
parseStatus: (null)
title: null
score: 1.0
markers: org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c
reprUrl: null
metadata _csh_ : ?�
{code}
After generating,
{code}
ksmets@precise64 ~/l/a/r/local> ./bin/nutch generate -topN 1
GeneratorJob: starting at 2014-06-03 11:55:38
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 1
GeneratorJob: finished at 2014-06-03 11:55:40, time elapsed: 00:00:02
GeneratorJob: generated batch id: 1401789338-222512082 containing 1 URLs
ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -stats
WebTable statistics start
Statistics for WebTable:
jobs: {db_stats-job_local73029265_0001={jobID=job_local73029265_0001, jobName=db_stats, counters={File Input Format Counters ={BYTES_READ=0}, Map-Reduce Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0, REDUCE_SHUFFLE_BYTES=0, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0, COMMITTED_HEAP_BYTES=358612992, CPU_MILLISECONDS=0, SPLIT_RAW_BYTES=769, COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0, REDUCE_INPUT_GROUPS=0, COMBINE_OUTPUT_RECORDS=0, PHYSICAL_MEMORY_BYTES=0, REDUCE_OUTPUT_RECORDS=0, VIRTUAL_MEMORY_BYTES=0, MAP_OUTPUT_RECORDS=0}, FileSystemCounters={FILE_BYTES_READ=974054, FILE_BYTES_WRITTEN=1144028}, File Output Format Counters ={BYTES_WRITTEN=98}}}}
TOTAL urls: 0
WebTable statistics: done
ksmets@precise64 ~/l/a/r/local> ./bin/nutch readdb -url http://example.com/
WebTableReader: java.lang.NullPointerException
at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:121)
at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:57)
at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:114)
at org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:238)
at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:494)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:430)
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)