You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "taknev ivrok (JIRA)" <ji...@apache.org> on 2008/04/30 17:21:56 UTC

[jira] Created: (NUTCH-630) Error caused by index-more plugin in the latest svn revision - 652259

Error caused by index-more plugin  in the latest svn revision - 652259 
-----------------------------------------------------------------------

                 Key: NUTCH-630
                 URL: https://issues.apache.org/jira/browse/NUTCH-630
             Project: Nutch
          Issue Type: Bug
            Reporter: taknev ivrok


This problem is reported in the user mailng list: http://www.nabble.com/index-more-problem--td16757538.html
Upon running bin/nutch  crawl urls -dir crawl  in the latest svn version the following error occurs. 

Note: This error does not happen after I remove index-more plugin from plugin.includes in the conf/nutch-site.xml file. 

Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlfs/crawldb
CrawlDb update: segments: [crawlfs/segments/20080430051112]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawlfs/segments/20080430051126
Generator: filtering: true
Generator: topN: 100000
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=2 - no more URLs to fetch.
LinkDb: starting
LinkDb: linkdb: crawlfs/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
LinkDb: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlfs/linkdb
Indexer: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
Indexer: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
IFD [Thread-102]: setInfoStream
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1cfd3b2
IW 0 [Thread-102]: setInfoStream:
dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-admin/mapred/local/index/_1406110510
autoCommit=true
mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@1536eec
mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@9770a3
ramBufferSizeMB=16.0 maxBuffereDocs=50 maxBuffereDeleteTerms=-1
maxFieldLength=10000 index=
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:311)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:144) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (NUTCH-630) Error caused by index-more plugin in the latest svn revision - 652259

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/NUTCH-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney closed NUTCH-630.
-------------------------------

       Resolution: Duplicate
    Fix Version/s: 1.0.0
         Assignee: Doğacan Güney

Duplicate of NUTCH-631

> Error caused by index-more plugin  in the latest svn revision - 652259 
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-630
>                 URL: https://issues.apache.org/jira/browse/NUTCH-630
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: taknev ivrok
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>
> This problem is reported in the user mailng list: http://www.nabble.com/index-more-problem--td16757538.html
> Upon running bin/nutch  crawl urls -dir crawl  in the latest svn version the following error occurs. 
> Note: This error does not happen after I remove index-more plugin from plugin.includes in the conf/nutch-site.xml file. 
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawlfs/crawldb
> CrawlDb update: segments: [crawlfs/segments/20080430051112]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: crawlfs/segments/20080430051126
> Generator: filtering: true
> Generator: topN: 100000
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=2 - no more URLs to fetch.
> LinkDb: starting
> LinkDb: linkdb: crawlfs/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> LinkDb: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
> LinkDb: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
> LinkDb: done
> Indexer: starting
> Indexer: linkdb: crawlfs/linkdb
> Indexer: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
> Indexer: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
> IFD [Thread-102]: setInfoStream
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1cfd3b2
> IW 0 [Thread-102]: setInfoStream:
> dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-admin/mapred/local/index/_1406110510
> autoCommit=true
> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@1536eec
> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@9770a3
> ramBufferSizeMB=16.0 maxBuffereDocs=50 maxBuffereDeleteTerms=-1
> maxFieldLength=10000 index=
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:311)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:144) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.