You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alexander Baranov <Al...@epam.com> on 2015/04/29 16:17:15 UTC

TableNotFoundException during inject job

Hello, everybody.

I have rather strange behavior of Nutch 2.3: even initial Inject job is failing with the following exception (see below).
All Hadoop infrastructure is up and running:
root@5e7ca0b0c19d:~# jps
2810 NutchServer
1071 SecondaryNameNode
99 QuorumPeerMain
1694 ResourceManager
4598 Jps
795 NameNode
2243 HMaster
2376 HRegionServer
2669 ThriftServer
1789 NodeManager
913 DataNode

Even Nutch is configured correctly, because with the same configuration I was able to crawl some pages and see the data in Solr.
If I understand correctly, one of the goals on InjectorJob is to create 'webpage' table inside of HBase. Shell of HBase also shows 0 tables created.

Do you have any ideas what is wrong here and what should be done to fix this.

2015-04-29 13:23:58,978 INFO  crawl.InjectorJob - InjectorJob: starting at 2015-04-29 13:23:58
2015-04-29 13:23:58,979 INFO  crawl.InjectorJob - InjectorJob: Injecting urlDir: ram.txt
2015-04-29 13:24:01,434 ERROR store.HBaseStore - org.apache.hadoop.hbase.TableExistsException: webpage
2015-04-29 13:24:01,434 ERROR store.HBaseStore - [Ljava.lang.StackTraceElement;@6a19905e
2015-04-29 13:24:01,454 INFO  crawl.InjectorJob - InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
2015-04-29 13:24:01,520 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-04-29 13:24:01,607 WARN  snappy.LoadSnappy - Snappy native library not loaded
2015-04-29 13:24:02,501 ERROR store.HBaseStore - org.apache.hadoop.hbase.TableExistsException: webpage
2015-04-29 13:24:02,501 ERROR store.HBaseStore - [Ljava.lang.StackTraceElement;@523b3317
2015-04-29 13:24:02,813 INFO  regex.RegexURLNormalizer - can't find rules for scope 'inject', using default
2015-04-29 13:24:02,986 WARN  client.HConnectionManager$HConnectionImplementation - Encountered problems when prefetch META table:
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: webpage, row=webpage,,99999999999999
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:151)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1059)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1121)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1001)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:958)
        at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
        at org.apache.gora.hbase.store.HBaseTableConnection$1.<init>(HBaseTableConnection.java:87)
        at org.apache.gora.hbase.store.HBaseTableConnection.getTable(HBaseTableConnection.java:87)
        at org.apache.gora.hbase.store.HBaseTableConnection.put(HBaseTableConnection.java:186)
        at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:260)
        at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:79)
        at org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:188)
        at org.apache.nutch.crawl.InjectorJob$UrlMapper.map(InjectorJob.java:82)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-04-29 13:24:02,996 ERROR store.HBaseStore - webpage
2015-04-29 13:24:02,996 ERROR store.HBaseStore - [Ljava.lang.StackTraceElement;@f757c05
2015-04-29 13:24:03,009 WARN  mapred.FileOutputCommitter - Output path is null in cleanup
2015-04-29 13:24:03,073 INFO  crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 0
2015-04-29 13:24:03,073 INFO  crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 1
2015-04-29 13:24:03,075 INFO  crawl.InjectorJob - Injector: finished at 2015-04-29 13:24:03, elapsed: 00:00:04

Alexander Baranov