You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by irfan romadona <tu...@gmail.com> on 2014/05/16 17:21:56 UTC

Nutch can't crawl particular website

Hi,

I'm new to Nutch. I have crawling several sites using Nutch and it works,
with several website as exception. I've looked up on hadoop.log buat can't
find any suspected errors for the failed crawling site. No document added
on console as any other successful crawling like this:

2014-05-15 00:46:32,669 INFO  solr.SolrWriter - Adding 5 documents

And I assumed it has a problems when generating or feeding?

Here is my log when I'm attempting to crawl http://www.okezone.com with
DEBUG mode:

2014-05-15 12:18:16,110 INFO  crawl.InjectorJob - InjectorJob: starting at
2014-05-15 12:18:16
2014-05-15 12:18:16,110 INFO  crawl.InjectorJob - InjectorJob: Injecting
urlDir: urls/okezone.com.txt
2014-05-15 12:18:17,464 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:17,528 INFO  crawl.InjectorJob - InjectorJob: Using class
org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
2014-05-15 12:18:17,564 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:17,628 WARN  snappy.LoadSnappy - Snappy native library not
loaded
2014-05-15 12:18:18,138 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:18,145 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:18,217 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local1586545423_0001.xml,
instantiating a new object cache
2014-05-15 12:18:18,267 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'inject', using default
2014-05-15 12:18:18,434 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:18,916 INFO  crawl.InjectorJob - InjectorJob: total number
of urls rejected by filters: 0
2014-05-15 12:18:18,916 INFO  crawl.InjectorJob - InjectorJob: total number
of urls injected after normalization and filtering: 1
2014-05-15 12:18:18,917 INFO  crawl.InjectorJob - Injector: finished at
2014-05-15 12:18:18, elapsed: 00:00:02
2014-05-15 12:18:19,914 INFO  crawl.GeneratorJob - GeneratorJob: starting
at 2014-05-15 12:18:19
2014-05-15 12:18:19,914 INFO  crawl.GeneratorJob - GeneratorJob: Selecting
best-scoring urls due for fetch.
2014-05-15 12:18:19,914 INFO  crawl.GeneratorJob - GeneratorJob: starting
2014-05-15 12:18:19,914 INFO  crawl.GeneratorJob - GeneratorJob: filtering:
false
2014-05-15 12:18:19,915 INFO  crawl.GeneratorJob - GeneratorJob:
normalizing: false
2014-05-15 12:18:19,915 INFO  crawl.GeneratorJob - GeneratorJob: topN: 50000
2014-05-15 12:18:20,261 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:20,261 INFO  crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2014-05-15 12:18:20,262 INFO  crawl.AbstractFetchSchedule -
defaultInterval=2592000
2014-05-15 12:18:20,262 INFO  crawl.AbstractFetchSchedule -
maxInterval=7776000
2014-05-15 12:18:21,231 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:21,445 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:21,500 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:21,632 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,210 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,346 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,408 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,423 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:22,583 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local445951503_0001.xml,
instantiating a new object cache
2014-05-15 12:18:22,610 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local445951503_0001.xml,
instantiating a new object cache
2014-05-15 12:18:22,610 INFO  crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2014-05-15 12:18:22,610 INFO  crawl.AbstractFetchSchedule -
defaultInterval=2592000
2014-05-15 12:18:22,610 INFO  crawl.AbstractFetchSchedule -
maxInterval=7776000
2014-05-15 12:18:22,882 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,892 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:23,015 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:23,043 INFO  crawl.GeneratorJob - GeneratorJob: finished
at 2014-05-15 12:18:23, time elapsed: 00:00:03
2014-05-15 12:18:23,044 INFO  crawl.GeneratorJob - GeneratorJob: generated
batch id: 1400131099-4513
2014-05-15 12:18:24,067 INFO  fetcher.FetcherJob - FetcherJob: starting
2014-05-15 12:18:24,068 INFO  fetcher.FetcherJob - FetcherJob: batchId:
1400131099-4513
2014-05-15 12:18:24,071 INFO  fetcher.FetcherJob - FetcherJob: threads: 50
2014-05-15 12:18:24,071 INFO  fetcher.FetcherJob - FetcherJob: parsing:
false
2014-05-15 12:18:24,071 INFO  fetcher.FetcherJob - FetcherJob: resuming:
false
2014-05-15 12:18:24,071 INFO  fetcher.FetcherJob - FetcherJob : timelimit
set for : 1400141904071
2014-05-15 12:18:24,919 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:24,919 INFO  http.Http - http.proxy.host = null
2014-05-15 12:18:24,919 INFO  http.Http - http.proxy.port = 8080
2014-05-15 12:18:24,919 INFO  http.Http - http.timeout = 2147483640
2014-05-15 12:18:24,919 INFO  http.Http - http.content.limit = 999999999
2014-05-15 12:18:24,919 INFO  http.Http - http.agent = My Nutch
Spider/Nutch-2.2.1
2014-05-15 12:18:24,919 INFO  http.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2014-05-15 12:18:24,919 INFO  http.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2014-05-15 12:18:25,672 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:25,869 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:25,927 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:26,066 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,641 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,781 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,852 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,868 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:26,945 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local270323155_0001.xml,
instantiating a new object cache
2014-05-15 12:18:27,242 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:27,258 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:27,261 INFO  fetcher.FetcherJob - Using queue mode : byHost
2014-05-15 12:18:27,261 INFO  fetcher.FetcherJob - Fetcher: threads: 50
2014-05-15 12:18:27,282 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local270323155_0001.xml,
instantiating a new object cache
2014-05-15 12:18:27,284 INFO  fetcher.FetcherJob - QueueFeeder finished:
total 1 records. Hit by time limit :0
2014-05-15 12:18:27,291 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread1, activeThreads=1
2014-05-15 12:18:27,291 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread2, activeThreads=1
2014-05-15 12:18:27,292 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread4, activeThreads=1
2014-05-15 12:18:27,292 INFO  fetcher.FetcherJob - fetching
http://www.okezone.com/ (queue crawl delay=5000ms)
2014-05-15 12:18:27,292 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread3, activeThreads=1
2014-05-15 12:18:27,293 INFO  http.Http - http.proxy.host = null
2014-05-15 12:18:27,293 INFO  http.Http - http.proxy.port = 8080
2014-05-15 12:18:27,293 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread6, activeThreads=2
2014-05-15 12:18:27,293 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread5, activeThreads=1
2014-05-15 12:18:27,293 INFO  http.Http - http.timeout = 2147483640
2014-05-15 12:18:27,293 INFO  http.Http - http.content.limit = 999999999
2014-05-15 12:18:27,293 INFO  http.Http - http.agent = My Nutch
Spider/Nutch-2.2.1
2014-05-15 12:18:27,293 INFO  http.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2014-05-15 12:18:27,293 INFO  http.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2014-05-15 12:18:27,293 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread8, activeThreads=1
2014-05-15 12:18:27,294 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread9, activeThreads=1
2014-05-15 12:18:27,294 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread10, activeThreads=1
2014-05-15 12:18:27,294 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread11, activeThreads=2
2014-05-15 12:18:27,294 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread12, activeThreads=1
2014-05-15 12:18:27,306 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread13, activeThreads=5
2014-05-15 12:18:27,306 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread14, activeThreads=3
2014-05-15 12:18:27,306 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread15, activeThreads=4
2014-05-15 12:18:27,306 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread16, activeThreads=1
2014-05-15 12:18:27,306 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread7, activeThreads=2
2014-05-15 12:18:27,306 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread17, activeThreads=1
2014-05-15 12:18:27,307 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread19, activeThreads=1
2014-05-15 12:18:27,308 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread21, activeThreads=1
2014-05-15 12:18:27,308 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread18, activeThreads=1
2014-05-15 12:18:27,308 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread20, activeThreads=1
2014-05-15 12:18:27,308 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread23, activeThreads=1
2014-05-15 12:18:27,309 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread24, activeThreads=1
2014-05-15 12:18:27,309 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread25, activeThreads=1
2014-05-15 12:18:27,309 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread26, activeThreads=1
2014-05-15 12:18:27,310 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread27, activeThreads=1
2014-05-15 12:18:27,310 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread28, activeThreads=1
2014-05-15 12:18:27,310 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread29, activeThreads=1
2014-05-15 12:18:27,311 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread30, activeThreads=1
2014-05-15 12:18:27,311 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread31, activeThreads=1
2014-05-15 12:18:27,311 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread32, activeThreads=1
2014-05-15 12:18:27,312 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread33, activeThreads=1
2014-05-15 12:18:27,312 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread22, activeThreads=1
2014-05-15 12:18:27,312 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread34, activeThreads=1
2014-05-15 12:18:27,312 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread35, activeThreads=1
2014-05-15 12:18:27,313 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread36, activeThreads=1
2014-05-15 12:18:27,317 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread38, activeThreads=1
2014-05-15 12:18:27,317 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread39, activeThreads=2
2014-05-15 12:18:27,317 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread37, activeThreads=1
2014-05-15 12:18:27,317 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread40, activeThreads=2
2014-05-15 12:18:27,318 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread42, activeThreads=1
2014-05-15 12:18:27,319 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread43, activeThreads=1
2014-05-15 12:18:27,319 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread44, activeThreads=1
2014-05-15 12:18:27,319 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread41, activeThreads=1
2014-05-15 12:18:27,319 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread45, activeThreads=1
2014-05-15 12:18:27,319 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread46, activeThreads=1
2014-05-15 12:18:27,320 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread47, activeThreads=1
2014-05-15 12:18:27,320 INFO  fetcher.FetcherJob - Fetcher: throughput
threshold: -1
2014-05-15 12:18:27,320 INFO  fetcher.FetcherJob - Fetcher: throughput
threshold sequence: 5
2014-05-15 12:18:27,321 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread48, activeThreads=1
2014-05-15 12:18:27,321 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread49, activeThreads=1
2014-05-15 12:18:27,448 INFO  fetcher.FetcherJob -
rules.isAllowed(fit.u.toString()):true
2014-05-15 12:18:28,121 INFO  fetcher.FetcherJob - -finishing thread
FetcherThread0, activeThreads=0
2014-05-15 12:18:32,321 INFO  fetcher.FetcherJob - 0/0 spinwaiting/active,
1 pages, 0 errors, 0.2 0 pages/s, 209 209 kb/s, 0 URLs in 0 queues
2014-05-15 12:18:32,321 INFO  fetcher.FetcherJob - -activeThreads=0
2014-05-15 12:18:32,330 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:32,489 INFO  fetcher.FetcherJob - FetcherJob: done
2014-05-15 12:18:33,550 INFO  parse.ParserJob - ParserJob: starting
2014-05-15 12:18:33,552 INFO  parse.ParserJob - ParserJob: resuming:
false
2014-05-15 12:18:33,552 INFO  parse.ParserJob - ParserJob: forced
reparse:    false
2014-05-15 12:18:33,552 INFO  parse.ParserJob - ParserJob: batchId:
1400131099-4513
2014-05-15 12:18:33,860 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:34,897 INFO  crawl.SignatureFactory - Using Signature
impl: org.apache.nutch.crawl.MD5Signature
2014-05-15 12:18:35,602 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:35,888 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:35,959 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:36,096 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,632 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,751 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,820 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,836 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:37,137 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:37,147 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:37,149 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local295157817_0001.xml,
instantiating a new object cache
2014-05-15 12:18:37,152 INFO  crawl.SignatureFactory - Using Signature
impl: org.apache.nutch.crawl.MD5Signature
2014-05-15 12:18:37,290 INFO  parse.ParserJob - Parsing
http://www.okezone.com/
2014-05-15 12:18:37,292 DEBUG parse.ParseUtil - Parsing [
http://www.okezone.com/] with [org.apache.nutch.parse.html.HtmlParser@8d5fa]
2014-05-15 12:18:37,905 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'fetcher', using default
2014-05-15 12:18:38,141 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:38,475 INFO  parse.ParserJob - ParserJob: success
2014-05-15 12:18:39,534 INFO  crawl.DbUpdaterJob - DbUpdaterJob: starting
2014-05-15 12:18:39,896 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:41,090 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:41,325 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:41,390 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:41,526 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,115 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,269 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,343 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,362 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:42,454 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local873829540_0001.xml,
instantiating a new object cache
2014-05-15 12:18:42,678 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,687 INFO  mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:42,691 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local873829540_0001.xml,
instantiating a new object cache
2014-05-15 12:18:42,691 INFO  crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2014-05-15 12:18:42,692 INFO  crawl.AbstractFetchSchedule -
defaultInterval=2592000
2014-05-15 12:18:42,692 INFO  crawl.AbstractFetchSchedule -
maxInterval=7776000
2014-05-15 12:18:42,849 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:42,930 INFO  crawl.DbUpdaterJob - DbUpdaterJob: done
2014-05-15 12:18:43,976 INFO  solr.SolrIndexerJob - SolrIndexerJob: starting
2014-05-15 12:18:44,350 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:44,454 INFO  basic.BasicIndexingFilter - Maximum title
length for indexing set to: 100
2014-05-15 12:18:44,454 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2014-05-15 12:18:44,457 INFO  anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2014-05-15 12:18:44,457 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2014-05-15 12:18:45,546 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:45,689 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:45,827 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,454 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,584 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,657 INFO  store.HBaseStore - Keyclass and nameclass
match but mismatching table names  mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,667 INFO  mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000

Can anyone have any suggestion on how to solve this problem? Thanks.

Irfan R.