You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Shubham Gupta (JIRA)" <ji...@apache.org> on 2016/09/15 11:43:20 UTC

[jira] [Created] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

Shubham Gupta created NUTCH-2315:
------------------------------------

             Summary: UpdateDb jobs fails everytime (Nutch 2.3.1)
                 Key: NUTCH-2315
                 URL: https://issues.apache.org/jira/browse/NUTCH-2315
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 2.3.1
         Environment: I am using it with Hadoop 2.7.1 + Mongo DB + Yarn + Gora 0.61
            Reporter: Shubham Gupta
             Fix For: 2.3.1


Hey,

Whenever I run the update job, the following error occurs:

INFO mapreduce.Job: Task Id : attempt_1473832356852_0107_m_000000_2, Status : FAILED
Error: java.net.MalformedURLException: no protocol: http%3A%2F%2Fwww.smh.com.au%2Fact-news%2Fcanberra-weather-warm-april-expected-after-record-breaking-march-temperatures-20160401-gnw2pg.html&title=Canberra+weather%3A+warm+April+expected+after+record+breaking+March+temperatures&source=The+Sydney+Morning+Herald&summary=Canberra+can+expect+warmer+than+average+temperatures+to+continue+for+April+after+enjoying+its+equal+second+warmest+March+on+record
	at java.net.URL.<init>(URL.java:586)
	at java.net.URL.<init>(URL.java:483)
	at java.net.URL.<init>(URL.java:432)
	at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43)
	at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96)
	at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

16/09/15 12:44:35 INFO mapreduce.Job:  map 100% reduce 100%
16/09/15 12:44:36 INFO mapreduce.Job: Job job_1473832356852_0107 failed with state FAILED due to: Task failed task_1473832356852_0107_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/09/15 12:44:36 INFO mapreduce.Job: Counters: 8
	Job Counters 
		Failed map tasks=4
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=388304
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=55472
		Total vcore-seconds taken by all map tasks=55472
		Total megabyte-seconds taken by all map tasks=198145984
Exception in thread "main" java.lang.RuntimeException: job failed: name=[rss]update-table, jobid=job_1473832356852_0107
	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
	at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:111)
	at org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:140)
	at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:174)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:178)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)