You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Michael Joyce (JIRA)" <ji...@apache.org> on 2015/08/12 23:31:45 UTC

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable

    [ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694210#comment-14694210 ] 

Michael Joyce commented on NUTCH-2049:
--------------------------------------

Hey [~lewismc],

Tried your patch here. Seems I have to add the following to the ivy.xml file to get this to work at all

{code}
<dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-jobclient" rev="2.4.0" conf="*->default"/>
{code}

Otherwise, I end up getting the following when I try to run a test crawl

{code}
Injector: starting at 2015-08-12 15:04:42
Injector: crawlDb: crawl/crawldb
Injector: urlDir: ../../urls_test
Injector: Converting injected urls to crawl db entries.
Injector: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
    at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:832)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:323)
    at org.apache.nutch.crawl.Injector.run(Injector.java:379)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.Injector.main(Injector.java:369)
{code}

However, after addressing that concern I end up runnign into the following on the test crawl

{code}
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.SequenceFile$Writer$KeyClassOption cannot be cast to org.apache.hadoop.io.MapFile$Writer$Option
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.SequenceFile$Writer$KeyClassOption cannot be cast to org.apache.hadoop.io.MapFile$Writer$Option
	at org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:70)
	at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:484)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2015-08-12 14:24:39,906 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
	at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:496)
	at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:532)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:505)
{code}

> Upgrade Trunk to Hadoop > 2.4 stable
> ------------------------------------
>
>                 Key: NUTCH-2049
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2049
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.11
>
>         Attachments: NUTCH-2049.patch
>
>
> Convo here - http://www.mail-archive.com/dev%40nutch.apache.org/msg18225.html
> I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > Hadoop 2.6.
> We can run our tests, we can validate, we can fix.
> I will be doing validation on 2.X in paralegal as this is what I use on my own projects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)