You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/06/05 20:59:59 UTC

[jira] [Resolved] (NUTCH-2271) Solr indexer Failed

     [ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved NUTCH-2271.
-----------------------------------------
    Resolution: Not A Bug

Nutch 1.12 supports Solr 5.4.1 not 6. Also Nutch 1.12 is based off of Hadoop 2.4.0 so please be mindful of this when using the technology stack.
Unless this issue is aimed at making Nutch work with Solr 6.X then the issue will not be resolved and it is therefore not a bug either. I am closing this one off for the time being. If someone wishes to upgrade Solr then please log a ticket for that.

> Solr indexer Failed 
> --------------------
>
>                 Key: NUTCH-2271
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2271
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.12
>         Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node 
>            Reporter: narendra
>            Assignee: Furkan KAMACI
>
> When i run this command
>   bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb -linkdb crawl_Test1/linkdb  crawl_Test1/segments/*
> 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a segment... skipping
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 22:21:47
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: /tmp/hadoop-unjar8621976524622577403/classes/plugins
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true]
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins:
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Regex URL Filter (urlfilter-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Html Parse Plug-in (parse-html)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	HTTP Framework (lib-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	the nutch core extension points (nutch-extensionpoints)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Basic Indexing Filter (index-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Anchor Indexing Filter (index-anchor)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Tika Parser Plug-in (parse-tika)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Basic URL Normalizer (urlnormalizer-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Regex URL Filter Framework (lib-regex-filter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Regex URL Normalizer (urlnormalizer-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	CyberNeko HTML Parser (lib-nekohtml)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	OPIC Scoring Plug-in (scoring-opic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Pass-through URL Normalizer (urlnormalizer-pass)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Http Protocol Plug-in (protocol-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	SolrIndexWriter (indexer-solr)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points:
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch Content Parser (org.apache.nutch.parse.Parser)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch URL Filter (org.apache.nutch.net.URLFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch Protocol (org.apache.nutch.protocol.Protocol)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch URL Ignore Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch Index Writer (org.apache.nutch.indexer.IndexWriter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: 	Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
> 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters :
> SOLRIndexWriter
> 	solr.server.url : URL of the SOLR instance
> 	solr.zookeeper.hosts : URL of the Zookeeper quorum
> 	solr.commit.size : buffer size when sending to SOLR (default 1000)
> 	solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
> 	solr.auth : use authentication (default false)
> 	solr.auth.username : username for authentication
> 	solr.auth.password : password for authentication
> 16/05/31 22:21:47 INFO indexer.IndexerMapReduce: IndexerMapReduce: crawldb: crawl_Test1/crawldb
> 16/05/31 22:21:47 INFO indexer.IndexerMapReduce: IndexerMapReduce: linkdb: crawl_Test1/linkdb
> 16/05/31 22:21:48 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
> 16/05/31 22:21:48 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
> 16/05/31 22:21:54 INFO mapred.FileInputFormat: Total input paths to process : 2
> 16/05/31 22:21:54 INFO mapreduce.JobSubmitter: number of splits:3
> 16/05/31 22:21:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464692893405_0045
> 16/05/31 22:21:55 INFO impl.YarnClientImpl: Submitted application application_1464692893405_0045
> 16/05/31 22:21:55 INFO mapreduce.Job: The url to track the job: http://localhost:9046/proxy/application_1464692893405_0045/
> 16/05/31 22:21:55 INFO mapreduce.Job: Running job: job_1464692893405_0045
> 16/05/31 22:22:16 INFO mapreduce.Job: Job job_1464692893405_0045 running in uber mode : false
> 16/05/31 22:22:16 INFO mapreduce.Job:  map 0% reduce 0%
> 16/05/31 22:22:28 INFO mapreduce.Job:  map 100% reduce 0%
> 16/05/31 22:22:33 INFO mapreduce.Job: Task Id : attempt_1464692893405_0045_r_000000_0, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
>     org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient; @57: areturn
>   Reason:
>     Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
>     bci: @57
>     flags: { }
>     locals: { 'org/apache/solr/common/params/SolrParams', 'org/apache/solr/common/params/ModifiableSolrParams', 'org/apache/http/impl/client/SystemDefaultHttpClient' }
>     stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' }
>   Bytecode:
>     0x0000000: bb00 0359 2ab7 0004 4cb2 0005 b900 0601
>     0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
>     0x0000020: b600 0a2b b600 0bb6 000c b900 0d02 00b8
>     0x0000030: 000e 4d2c 2bb8 000f 2cb0               
>   Stackmap Table:
>     append_frame(@47,Object[#143])
> 16/05/31 22:22:40 INFO mapreduce.Job: Task Id : attempt_1464692893405_0045_r_000000_1, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
>     org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient; @57: areturn
>   Reason:
>     Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
>     bci: @57
>     flags: { }
>     locals: { 'org/apache/solr/common/params/SolrParams', 'org/apache/solr/common/params/ModifiableSolrParams', 'org/apache/http/impl/client/SystemDefaultHttpClient' }
>     stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' }
>   Bytecode:
>     0x0000000: bb00 0359 2ab7 0004 4cb2 0005 b900 0601
>     0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
>     0x0000020: b600 0a2b b600 0bb6 000c b900 0d02 00b8
>     0x0000030: 000e 4d2c 2bb8 000f 2cb0               
>   Stackmap Table:
>     append_frame(@47,Object[#143])
> 16/05/31 22:22:46 INFO mapreduce.Job: Task Id : attempt_1464692893405_0045_r_000000_2, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
>     org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; @58: areturn
>   Reason:
>     Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, stack[0]) is not assignable to 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
>     bci: @58
>     flags: { }
>     locals: { 'org/apache/solr/common/params/SolrParams', 'org/apache/http/conn/ClientConnectionManager', 'org/apache/solr/common/params/ModifiableSolrParams', 'org/apache/http/impl/client/DefaultHttpClient' }
>     stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
>   Bytecode:
>     0x0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
>     0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
>     0x0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b
>     0x0000030: b800 104e 2d2c b800 0f2d b0            
>   Stackmap Table:
>     append_frame(@47,Object[#143])
> 16/05/31 22:22:53 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/31 22:22:53 INFO mapreduce.Job: Job job_1464692893405_0045 failed with state FAILED due to: Task failed task_1464692893405_0045_r_000000
> Job failed as tasks failed. failedMaps:0 failedReduces:1
> 16/05/31 22:22:54 INFO mapreduce.Job: Counters: 37
> 	File System Counters
> 		FILE: Number of bytes read=0
> 		FILE: Number of bytes written=458051
> 		FILE: Number of read operations=0
> 		FILE: Number of large read operations=0
> 		FILE: Number of write operations=0
> 		HDFS: Number of bytes read=17460
> 		HDFS: Number of bytes written=0
> 		HDFS: Number of read operations=12
> 		HDFS: Number of large read operations=0
> 		HDFS: Number of write operations=0
> 	Job Counters 
> 		Failed reduce tasks=4
> 		Launched map tasks=3
> 		Launched reduce tasks=4
> 		Data-local map tasks=3
> 		Total time spent by all maps in occupied slots (ms)=56496
> 		Total time spent by all reduces in occupied slots (ms)=30056
> 		Total time spent by all map tasks (ms)=28248
> 		Total time spent by all reduce tasks (ms)=15028
> 		Total vcore-milliseconds taken by all map tasks=28248
> 		Total vcore-milliseconds taken by all reduce tasks=15028
> 		Total megabyte-milliseconds taken by all map tasks=28925952
> 		Total megabyte-milliseconds taken by all reduce tasks=15388672
> 	Map-Reduce Framework
> 		Map input records=184
> 		Map output records=184
> 		Map output bytes=15037
> 		Map output materialized bytes=15428
> 		Input split bytes=392
> 		Combine input records=0
> 		Spilled Records=184
> 		Failed Shuffles=0
> 		Merged Map outputs=0
> 		GC time elapsed (ms)=758
> 		CPU time spent (ms)=6200
> 		Physical memory (bytes) snapshot=841703424
> 		Virtual memory (bytes) snapshot=5765849088
> 		Total committed heap usage (bytes)=611319808
> 	File Input Format Counters 
> 		Bytes Read=17068
> 16/05/31 22:22:54 ERROR indexer.IndexingJob: Indexer: java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
> 	at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> 	at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)