You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by sn...@apache.org on 2018/04/26 11:13:55 UTC

[nutch] branch master updated (50c6c23 -> e41e55a)

This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git.


    from 50c6c23  Merge pull request #322 from sebastian-nagel/nutch-2569-setjarbyclass
     add 4475878  NUTCH-2570 Deduplication job fails to install deduplicated CrawlDb - run merge job to update status of duplicates to CrawlDb - lock CrawlDb while running merge job - cleanup if merge job fails
     new e41e55a  Merge pull request #323 from sebastian-nagel/NUTCH-2570-dedup-job-merge-crawldb

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/nutch/crawl/DeduplicationJob.java   | 26 +++++++++++++---------
 1 file changed, 15 insertions(+), 11 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
snagel@apache.org.

[nutch] 01/01: Merge pull request #323 from sebastian-nagel/NUTCH-2570-dedup-job-merge-crawldb

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git

commit e41e55afcbfcccd6f2fa8ddae3c2137a9fdd122b
Merge: 50c6c23 4475878
Author: Sebastian Nagel <sn...@apache.org>
AuthorDate: Thu Apr 26 13:13:53 2018 +0200

    Merge pull request #323 from sebastian-nagel/NUTCH-2570-dedup-job-merge-crawldb
    
    NUTCH-2570 Deduplication job fails to install deduplicated CrawlDb

 .../org/apache/nutch/crawl/DeduplicationJob.java   | 26 +++++++++++++---------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --cc src/java/org/apache/nutch/crawl/DeduplicationJob.java
index eaeb835,12ebd3c..8887b4f
--- a/src/java/org/apache/nutch/crawl/DeduplicationJob.java
+++ b/src/java/org/apache/nutch/crawl/DeduplicationJob.java
@@@ -293,12 -292,11 +292,12 @@@ public class DeduplicationJob extends N
  
      Job job = NutchJob.getInstance(getConf());
      Configuration conf = job.getConfiguration();
-     job.setJobName("Deduplication on " + crawldb);
+     job.setJobName("Deduplication on " + crawlDb);
      conf.set(DEDUPLICATION_GROUP_MODE, group);
      conf.set(DEDUPLICATION_COMPARE_ORDER, compareOrder);
 +    job.setJarByClass(DeduplicationJob.class);
  
-     FileInputFormat.addInputPath(job, new Path(crawldb, CrawlDb.CURRENT_NAME));
+     FileInputFormat.addInputPath(job, new Path(crawlDb, CrawlDb.CURRENT_NAME));
      job.setInputFormatClass(SequenceFileInputFormat.class);
  
      FileOutputFormat.setOutputPath(job, tempDir);

-- 
To stop receiving notification emails like this one, please contact
snagel@apache.org.