You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "taknev ivrok (JIRA)" <ji...@apache.org> on 2008/04/30 17:21:56 UTC
[jira] Created: (NUTCH-630) Error caused by index-more plugin in
the latest svn revision - 652259
Error caused by index-more plugin in the latest svn revision - 652259
-----------------------------------------------------------------------
Key: NUTCH-630
URL: https://issues.apache.org/jira/browse/NUTCH-630
Project: Nutch
Issue Type: Bug
Reporter: taknev ivrok
This problem is reported in the user mailng list: http://www.nabble.com/index-more-problem--td16757538.html
Upon running bin/nutch crawl urls -dir crawl in the latest svn version the following error occurs.
Note: This error does not happen after I remove index-more plugin from plugin.includes in the conf/nutch-site.xml file.
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlfs/crawldb
CrawlDb update: segments: [crawlfs/segments/20080430051112]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawlfs/segments/20080430051126
Generator: filtering: true
Generator: topN: 100000
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=2 - no more URLs to fetch.
LinkDb: starting
LinkDb: linkdb: crawlfs/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
LinkDb: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlfs/linkdb
Indexer: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
Indexer: adding segment:
file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
IFD [Thread-102]: setInfoStream
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1cfd3b2
IW 0 [Thread-102]: setInfoStream:
dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-admin/mapred/local/index/_1406110510
autoCommit=true
mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@1536eec
mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@9770a3
ramBufferSizeMB=16.0 maxBuffereDocs=50 maxBuffereDeleteTerms=-1
maxFieldLength=10000 index=
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:311)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:144)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (NUTCH-630) Error caused by index-more plugin in
the latest svn revision - 652259
Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doğacan Güney closed NUTCH-630.
-------------------------------
Resolution: Duplicate
Fix Version/s: 1.0.0
Assignee: Doğacan Güney
Duplicate of NUTCH-631
> Error caused by index-more plugin in the latest svn revision - 652259
> -----------------------------------------------------------------------
>
> Key: NUTCH-630
> URL: https://issues.apache.org/jira/browse/NUTCH-630
> Project: Nutch
> Issue Type: Bug
> Reporter: taknev ivrok
> Assignee: Doğacan Güney
> Fix For: 1.0.0
>
>
> This problem is reported in the user mailng list: http://www.nabble.com/index-more-problem--td16757538.html
> Upon running bin/nutch crawl urls -dir crawl in the latest svn version the following error occurs.
> Note: This error does not happen after I remove index-more plugin from plugin.includes in the conf/nutch-site.xml file.
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawlfs/crawldb
> CrawlDb update: segments: [crawlfs/segments/20080430051112]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: crawlfs/segments/20080430051126
> Generator: filtering: true
> Generator: topN: 100000
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=2 - no more URLs to fetch.
> LinkDb: starting
> LinkDb: linkdb: crawlfs/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> LinkDb: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
> LinkDb: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
> LinkDb: done
> Indexer: starting
> Indexer: linkdb: crawlfs/linkdb
> Indexer: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051112
> Indexer: adding segment:
> file:/home/admin/nutch-2008-04-30_04-01-32/crawlfs/segments/20080430051053
> IFD [Thread-102]: setInfoStream
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1cfd3b2
> IW 0 [Thread-102]: setInfoStream:
> dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-admin/mapred/local/index/_1406110510
> autoCommit=true
> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@1536eec
> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@9770a3
> ramBufferSizeMB=16.0 maxBuffereDocs=50 maxBuffereDeleteTerms=-1
> maxFieldLength=10000 index=
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:311)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:144)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.