You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/07/25 07:33:04 UTC
[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable

    [ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641402#comment-14641402 ] 

Lewis John McGibbney commented on NUTCH-2049:
---------------------------------------------

BTW, this is only for 2.4.0 for same reason as explained at last issue. 
Thsi is an upgrade of dependencies and API usage.... NOT mapred --> mapreduce API's for each NutchJob.
[~markus.jelsma@openindex.io] had a great crack at trying to upgrade some... I would also join his ranks and make best efforts to make all jobs 2.X mapreduce API if it makes sense. It would be nice to have a Nutch roadMap TBH.
Team, how do we feel here?
Tests are broken as follows
{code}
  1 Testsuite: org.apache.nutch.crawl.TestCrawlDbFilter
  2 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.986 sec
  3 ------------- Standard Output ---------------
  4 2015-07-25 01:29:50,852 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  5 2015-07-25 01:29:51,215 INFO  compress.CodecPool (CodecPool.java:getCompressor(151)) - Got brand-new compressor [.deflate]
  6 2015-07-25 01:29:51,231 INFO  compress.CodecPool (CodecPool.java:getCompressor(151)) - Got brand-new compressor [.deflate]
  7 2015-07-25 01:29:51,231 INFO  crawl.CrawlDBTestUtil (CrawlDBTestUtil.java:createCrawlDb(67)) - adding:http://www.example.com
  8 2015-07-25 01:29:51,232 INFO  crawl.CrawlDBTestUtil (CrawlDBTestUtil.java:createCrawlDb(67)) - adding:http://www.example1.com
  9 2015-07-25 01:29:51,235 INFO  crawl.CrawlDBTestUtil (CrawlDBTestUtil.java:createCrawlDb(67)) - adding:http://www.example2.com
 10 ------------- ---------------- ---------------
 11 ------------- Standard Error -----------------
 12 SLF4J: Class path contains multiple SLF4J bindings.
 13 SLF4J: Found binding in [jar:file:/usr/local/trunk_clean/build/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 14 SLF4J: Found binding in [jar:file:/usr/local/trunk_clean/build/test/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
 16 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 17 ------------- ---------------- ---------------
 18
 19 Testcase: testUrl404Purging took 0.969 sec
 20         Caused an ERROR
 21 Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
 22 java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
 23         at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
 24         at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
 25         at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
 26         at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
 27         at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449)
 28         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:832)
 29         at org.apache.nutch.crawl.TestCrawlDbFilter.testUrl404Purging(TestCrawlDbFilter.java:107)
{code} 

> Upgrade Trunk to Hadoop > 2.4 stable
> ------------------------------------
>
>                 Key: NUTCH-2049
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2049
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.11
>
>         Attachments: NUTCH-2049.patch
>
>
> Convo here - http://www.mail-archive.com/dev%40nutch.apache.org/msg18225.html
> I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > Hadoop 2.6.
> We can run our tests, we can validate, we can fix.
> I will be doing validation on 2.X in paralegal as this is what I use on my own projects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)