You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2018/04/26 11:13:56 UTC

[nutch] 01/01: Merge pull request #323 from sebastian-nagel/NUTCH-2570-dedup-job-merge-crawldb

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit e41e55afcbfcccd6f2fa8ddae3c2137a9fdd122b
Merge: 50c6c23 4475878
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Thu Apr 26 13:13:53 2018 +0200

    Merge pull request #323 from sebastian-nagel/NUTCH-2570-dedup-job-merge-crawldb
    
    NUTCH-2570 Deduplication job fails to install deduplicated CrawlDb

 .../org/apache/nutch/crawl/DeduplicationJob.java   | 26 +++++++++++++---------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --cc src/java/org/apache/nutch/crawl/DeduplicationJob.java
index eaeb835,12ebd3c..8887b4f
--- a/src/java/org/apache/nutch/crawl/DeduplicationJob.java
+++ b/src/java/org/apache/nutch/crawl/DeduplicationJob.java
@@@ -293,12 -292,11 +292,12 @@@ public class DeduplicationJob extends N
  
      Job job = NutchJob.getInstance(getConf());
      Configuration conf = job.getConfiguration();
-     job.setJobName("Deduplication on " + crawldb);
+     job.setJobName("Deduplication on " + crawlDb);
      conf.set(DEDUPLICATION_GROUP_MODE, group);
      conf.set(DEDUPLICATION_COMPARE_ORDER, compareOrder);
 +    job.setJarByClass(DeduplicationJob.class);
  
-     FileInputFormat.addInputPath(job, new Path(crawldb, CrawlDb.CURRENT_NAME));
+     FileInputFormat.addInputPath(job, new Path(crawlDb, CrawlDb.CURRENT_NAME));
      job.setInputFormatClass(SequenceFileInputFormat.class);
  
      FileOutputFormat.setOutputPath(job, tempDir);

-- 
To stop receiving notification emails like this one, please contact
snagel@apache.org.