You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by irfan romadona <tu...@gmail.com> on 2014/05/16 17:21:56 UTC
Nutch can't crawl particular website
Hi,
I'm new to Nutch. I have crawling several sites using Nutch and it works,
with several website as exception. I've looked up on hadoop.log buat can't
find any suspected errors for the failed crawling site. No document added
on console as any other successful crawling like this:
2014-05-15 00:46:32,669 INFO solr.SolrWriter - Adding 5 documents
And I assumed it has a problems when generating or feeding?
Here is my log when I'm attempting to crawl http://www.okezone.com with
DEBUG mode:
2014-05-15 12:18:16,110 INFO crawl.InjectorJob - InjectorJob: starting at
2014-05-15 12:18:16
2014-05-15 12:18:16,110 INFO crawl.InjectorJob - InjectorJob: Injecting
urlDir: urls/okezone.com.txt
2014-05-15 12:18:17,464 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:17,528 INFO crawl.InjectorJob - InjectorJob: Using class
org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
2014-05-15 12:18:17,564 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:17,628 WARN snappy.LoadSnappy - Snappy native library not
loaded
2014-05-15 12:18:18,138 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:18,145 INFO mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:18,217 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local1586545423_0001.xml,
instantiating a new object cache
2014-05-15 12:18:18,267 INFO regex.RegexURLNormalizer - can't find rules
for scope 'inject', using default
2014-05-15 12:18:18,434 WARN mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:18,916 INFO crawl.InjectorJob - InjectorJob: total number
of urls rejected by filters: 0
2014-05-15 12:18:18,916 INFO crawl.InjectorJob - InjectorJob: total number
of urls injected after normalization and filtering: 1
2014-05-15 12:18:18,917 INFO crawl.InjectorJob - Injector: finished at
2014-05-15 12:18:18, elapsed: 00:00:02
2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: starting
at 2014-05-15 12:18:19
2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: Selecting
best-scoring urls due for fetch.
2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: starting
2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: filtering:
false
2014-05-15 12:18:19,915 INFO crawl.GeneratorJob - GeneratorJob:
normalizing: false
2014-05-15 12:18:19,915 INFO crawl.GeneratorJob - GeneratorJob: topN: 50000
2014-05-15 12:18:20,261 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:20,261 INFO crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2014-05-15 12:18:20,262 INFO crawl.AbstractFetchSchedule -
defaultInterval=2592000
2014-05-15 12:18:20,262 INFO crawl.AbstractFetchSchedule -
maxInterval=7776000
2014-05-15 12:18:21,231 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:21,445 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:21,500 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:21,632 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,210 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,346 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,408 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,423 INFO mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:22,583 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local445951503_0001.xml,
instantiating a new object cache
2014-05-15 12:18:22,610 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local445951503_0001.xml,
instantiating a new object cache
2014-05-15 12:18:22,610 INFO crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2014-05-15 12:18:22,610 INFO crawl.AbstractFetchSchedule -
defaultInterval=2592000
2014-05-15 12:18:22,610 INFO crawl.AbstractFetchSchedule -
maxInterval=7776000
2014-05-15 12:18:22,882 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:22,892 INFO mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:23,015 WARN mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:23,043 INFO crawl.GeneratorJob - GeneratorJob: finished
at 2014-05-15 12:18:23, time elapsed: 00:00:03
2014-05-15 12:18:23,044 INFO crawl.GeneratorJob - GeneratorJob: generated
batch id: 1400131099-4513
2014-05-15 12:18:24,067 INFO fetcher.FetcherJob - FetcherJob: starting
2014-05-15 12:18:24,068 INFO fetcher.FetcherJob - FetcherJob: batchId:
1400131099-4513
2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob: threads: 50
2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob: parsing:
false
2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob: resuming:
false
2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob : timelimit
set for : 1400141904071
2014-05-15 12:18:24,919 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:24,919 INFO http.Http - http.proxy.host = null
2014-05-15 12:18:24,919 INFO http.Http - http.proxy.port = 8080
2014-05-15 12:18:24,919 INFO http.Http - http.timeout = 2147483640
2014-05-15 12:18:24,919 INFO http.Http - http.content.limit = 999999999
2014-05-15 12:18:24,919 INFO http.Http - http.agent = My Nutch
Spider/Nutch-2.2.1
2014-05-15 12:18:24,919 INFO http.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2014-05-15 12:18:24,919 INFO http.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2014-05-15 12:18:25,672 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:25,869 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:25,927 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:26,066 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,641 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,781 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,852 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:26,868 INFO mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:26,945 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local270323155_0001.xml,
instantiating a new object cache
2014-05-15 12:18:27,242 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:27,258 INFO mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:27,261 INFO fetcher.FetcherJob - Using queue mode : byHost
2014-05-15 12:18:27,261 INFO fetcher.FetcherJob - Fetcher: threads: 50
2014-05-15 12:18:27,282 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local270323155_0001.xml,
instantiating a new object cache
2014-05-15 12:18:27,284 INFO fetcher.FetcherJob - QueueFeeder finished:
total 1 records. Hit by time limit :0
2014-05-15 12:18:27,291 INFO fetcher.FetcherJob - -finishing thread
FetcherThread1, activeThreads=1
2014-05-15 12:18:27,291 INFO fetcher.FetcherJob - -finishing thread
FetcherThread2, activeThreads=1
2014-05-15 12:18:27,292 INFO fetcher.FetcherJob - -finishing thread
FetcherThread4, activeThreads=1
2014-05-15 12:18:27,292 INFO fetcher.FetcherJob - fetching
http://www.okezone.com/ (queue crawl delay=5000ms)
2014-05-15 12:18:27,292 INFO fetcher.FetcherJob - -finishing thread
FetcherThread3, activeThreads=1
2014-05-15 12:18:27,293 INFO http.Http - http.proxy.host = null
2014-05-15 12:18:27,293 INFO http.Http - http.proxy.port = 8080
2014-05-15 12:18:27,293 INFO fetcher.FetcherJob - -finishing thread
FetcherThread6, activeThreads=2
2014-05-15 12:18:27,293 INFO fetcher.FetcherJob - -finishing thread
FetcherThread5, activeThreads=1
2014-05-15 12:18:27,293 INFO http.Http - http.timeout = 2147483640
2014-05-15 12:18:27,293 INFO http.Http - http.content.limit = 999999999
2014-05-15 12:18:27,293 INFO http.Http - http.agent = My Nutch
Spider/Nutch-2.2.1
2014-05-15 12:18:27,293 INFO http.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2014-05-15 12:18:27,293 INFO http.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2014-05-15 12:18:27,293 INFO fetcher.FetcherJob - -finishing thread
FetcherThread8, activeThreads=1
2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread
FetcherThread9, activeThreads=1
2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread
FetcherThread10, activeThreads=1
2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread
FetcherThread11, activeThreads=2
2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread
FetcherThread12, activeThreads=1
2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread
FetcherThread13, activeThreads=5
2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread
FetcherThread14, activeThreads=3
2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread
FetcherThread15, activeThreads=4
2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread
FetcherThread16, activeThreads=1
2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread
FetcherThread7, activeThreads=2
2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread
FetcherThread17, activeThreads=1
2014-05-15 12:18:27,307 INFO fetcher.FetcherJob - -finishing thread
FetcherThread19, activeThreads=1
2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread
FetcherThread21, activeThreads=1
2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread
FetcherThread18, activeThreads=1
2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread
FetcherThread20, activeThreads=1
2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread
FetcherThread23, activeThreads=1
2014-05-15 12:18:27,309 INFO fetcher.FetcherJob - -finishing thread
FetcherThread24, activeThreads=1
2014-05-15 12:18:27,309 INFO fetcher.FetcherJob - -finishing thread
FetcherThread25, activeThreads=1
2014-05-15 12:18:27,309 INFO fetcher.FetcherJob - -finishing thread
FetcherThread26, activeThreads=1
2014-05-15 12:18:27,310 INFO fetcher.FetcherJob - -finishing thread
FetcherThread27, activeThreads=1
2014-05-15 12:18:27,310 INFO fetcher.FetcherJob - -finishing thread
FetcherThread28, activeThreads=1
2014-05-15 12:18:27,310 INFO fetcher.FetcherJob - -finishing thread
FetcherThread29, activeThreads=1
2014-05-15 12:18:27,311 INFO fetcher.FetcherJob - -finishing thread
FetcherThread30, activeThreads=1
2014-05-15 12:18:27,311 INFO fetcher.FetcherJob - -finishing thread
FetcherThread31, activeThreads=1
2014-05-15 12:18:27,311 INFO fetcher.FetcherJob - -finishing thread
FetcherThread32, activeThreads=1
2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread
FetcherThread33, activeThreads=1
2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread
FetcherThread22, activeThreads=1
2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread
FetcherThread34, activeThreads=1
2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread
FetcherThread35, activeThreads=1
2014-05-15 12:18:27,313 INFO fetcher.FetcherJob - -finishing thread
FetcherThread36, activeThreads=1
2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread
FetcherThread38, activeThreads=1
2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread
FetcherThread39, activeThreads=2
2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread
FetcherThread37, activeThreads=1
2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread
FetcherThread40, activeThreads=2
2014-05-15 12:18:27,318 INFO fetcher.FetcherJob - -finishing thread
FetcherThread42, activeThreads=1
2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread
FetcherThread43, activeThreads=1
2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread
FetcherThread44, activeThreads=1
2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread
FetcherThread41, activeThreads=1
2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread
FetcherThread45, activeThreads=1
2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread
FetcherThread46, activeThreads=1
2014-05-15 12:18:27,320 INFO fetcher.FetcherJob - -finishing thread
FetcherThread47, activeThreads=1
2014-05-15 12:18:27,320 INFO fetcher.FetcherJob - Fetcher: throughput
threshold: -1
2014-05-15 12:18:27,320 INFO fetcher.FetcherJob - Fetcher: throughput
threshold sequence: 5
2014-05-15 12:18:27,321 INFO fetcher.FetcherJob - -finishing thread
FetcherThread48, activeThreads=1
2014-05-15 12:18:27,321 INFO fetcher.FetcherJob - -finishing thread
FetcherThread49, activeThreads=1
2014-05-15 12:18:27,448 INFO fetcher.FetcherJob -
rules.isAllowed(fit.u.toString()):true
2014-05-15 12:18:28,121 INFO fetcher.FetcherJob - -finishing thread
FetcherThread0, activeThreads=0
2014-05-15 12:18:32,321 INFO fetcher.FetcherJob - 0/0 spinwaiting/active,
1 pages, 0 errors, 0.2 0 pages/s, 209 209 kb/s, 0 URLs in 0 queues
2014-05-15 12:18:32,321 INFO fetcher.FetcherJob - -activeThreads=0
2014-05-15 12:18:32,330 WARN mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:32,489 INFO fetcher.FetcherJob - FetcherJob: done
2014-05-15 12:18:33,550 INFO parse.ParserJob - ParserJob: starting
2014-05-15 12:18:33,552 INFO parse.ParserJob - ParserJob: resuming:
false
2014-05-15 12:18:33,552 INFO parse.ParserJob - ParserJob: forced
reparse: false
2014-05-15 12:18:33,552 INFO parse.ParserJob - ParserJob: batchId:
1400131099-4513
2014-05-15 12:18:33,860 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:34,897 INFO crawl.SignatureFactory - Using Signature
impl: org.apache.nutch.crawl.MD5Signature
2014-05-15 12:18:35,602 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:35,888 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:35,959 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:36,096 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,632 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,751 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,820 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:36,836 INFO mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:37,137 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:37,147 INFO mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:37,149 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local295157817_0001.xml,
instantiating a new object cache
2014-05-15 12:18:37,152 INFO crawl.SignatureFactory - Using Signature
impl: org.apache.nutch.crawl.MD5Signature
2014-05-15 12:18:37,290 INFO parse.ParserJob - Parsing
http://www.okezone.com/
2014-05-15 12:18:37,292 DEBUG parse.ParseUtil - Parsing [
http://www.okezone.com/] with [org.apache.nutch.parse.html.HtmlParser@8d5fa]
2014-05-15 12:18:37,905 INFO regex.RegexURLNormalizer - can't find rules
for scope 'fetcher', using default
2014-05-15 12:18:38,141 WARN mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:38,475 INFO parse.ParserJob - ParserJob: success
2014-05-15 12:18:39,534 INFO crawl.DbUpdaterJob - DbUpdaterJob: starting
2014-05-15 12:18:39,896 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:41,090 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:41,325 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:41,390 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:41,526 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,115 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,269 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,343 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,362 INFO mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
2014-05-15 12:18:42,454 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local873829540_0001.xml,
instantiating a new object cache
2014-05-15 12:18:42,678 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:42,687 INFO mapreduce.GoraRecordWriter -
gora.buffer.write.limit = 10000
2014-05-15 12:18:42,691 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local873829540_0001.xml,
instantiating a new object cache
2014-05-15 12:18:42,691 INFO crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2014-05-15 12:18:42,692 INFO crawl.AbstractFetchSchedule -
defaultInterval=2592000
2014-05-15 12:18:42,692 INFO crawl.AbstractFetchSchedule -
maxInterval=7776000
2014-05-15 12:18:42,849 WARN mapred.FileOutputCommitter - Output path is
null in cleanup
2014-05-15 12:18:42,930 INFO crawl.DbUpdaterJob - DbUpdaterJob: done
2014-05-15 12:18:43,976 INFO solr.SolrIndexerJob - SolrIndexerJob: starting
2014-05-15 12:18:44,350 DEBUG util.ObjectCache - No object cache found for
conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new
object cache
2014-05-15 12:18:44,454 INFO basic.BasicIndexingFilter - Maximum title
length for indexing set to: 100
2014-05-15 12:18:44,454 INFO indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2014-05-15 12:18:44,457 INFO anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2014-05-15 12:18:44,457 INFO indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2014-05-15 12:18:45,546 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:45,689 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2014-05-15 12:18:45,827 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,454 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,584 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,657 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 'okezone.com_webpage' , assuming they are the same.
2014-05-15 12:18:46,667 INFO mapreduce.GoraRecordReader -
gora.buffer.read.limit = 10000
Can anyone have any suggestion on how to solve this problem? Thanks.
Irfan R.