You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "h00kpublic@gmail.com" <h0...@googlemail.com> on 2010/09/11 22:37:00 UTC
problem by integration of apache nutch (release 1.2) in apach solr
(trunk) - got solr exception
hi...
I have configured the |solrindex-mapping.xml| (nutch) and configured my
solr |schema.xml| and |solrconfig.xml| too. Both working well on single
run, but if I use the |bin/nutch solrindex ...| I get an exception:|
|
|org.apache.solr.common.SolrException: Document [null] missing required
field: id |
I have configured the |id| in all config-files. At
|solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
of solr I configured the id too. I don't know what's wrong. I add some
logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
add one loginfo at these line, when the read fields are added to
SolrInputDocument. The result after building and running is:
|2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value: bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
|
You will see only 3 read fields O_o. Does anyone know if there is
something wrong in my configuration? *I need the running nutch really
fast, because I am currently writing on my bachelor thesis :/* (on
information integration of heterogenous data sources at the local network)
Best regards
marcel =)
The rest of the log:
|2010-09-11 21:31:06,079 INFO solr.SolrWriter - open()
2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: content dest: content
2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: site dest: site
2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: title dest: metadata_title
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: host dest: host
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: segment dest: segment
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: boost dest: boost
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: digest dest: digest
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: tstamp dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: lastModified dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest: url
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest: id
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest: id
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest: url
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - uniqueKey = id
2010-09-11 21:31:06,291 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: digest, value: 18abadd34a2bd71a8336fa5e8c6dbedb
2010-09-11 21:31:06,306 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: digest, value: 3267fd5ea03852cdc83383635d133fad
2010-09-11 21:31:06,310 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,311 INFO solr.SolrWriter - Key: digest, value: b61607602ab99eda5684adc9966349d6
2010-09-11 21:31:06,314 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: segment, value: 20100911212851
2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: digest, value: 9bdb8df3d1addf254203542dd22096d3
2010-09-11 21:31:06,316 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,317 INFO solr.SolrWriter - Key: digest, value: 66eb3639ae15655bf91dc53208f95167
2010-09-11 21:31:06,319 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: digest, value: 6e0501b52e204c2a68d9caa70dd0dfa9
2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value: bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
2010-09-11 21:31:06,687 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Document [null] missing required field: id
Document [null] missing required field: id
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
|
Re: problem by integration of apache nutch (release 1.2) in apach
solr (trunk) - got solr exception
Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-09-11 22:37, h00kpublic@gmail.com wrote:
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
Nutch 1.2 does NOT work with Solr trunk. You need to use the latest
release of Solr 1.4.1.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: problem by integration of apache nutch (release 1.2) in apach
solr (trunk) - got solr exception
Posted by "h00kpublic@gmail.com" <h0...@googlemail.com>.
hm.. if i use the nutch indexing i got these exception:
2010-09-12 08:58:46,135 WARN mapred.LocalJobRunner - job_local_0001
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:719)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:724)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2263)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2249)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2219)
at
org.apache.nutch.indexer.lucene.LuceneWriter.close(LuceneWriter.java:237)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:482)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-12 08:58:46,264 ERROR indexer.Indexer - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
is this a configuration fault by me or a nutch mistake? should i use
another version (maybe release-1.1)?
best regards marcel :)
On 09/11/2010 10:57 PM, Nemani, Raj wrote:
> Tyr to see If you can index into Nutch's native index. If you can, then
> inspect the NUtch index using Luke (latest version). Sorry I could not
> provide a direct answer but wanted to see if the generated crawl data
> has issues...
>
> -----Original Message-----
> From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
> Sent: Saturday, September 11, 2010 4:37 PM
> To: user@nutch.apache.org
> Subject: problem by integration of apache nutch (release 1.2) in apach
> solr (trunk) - got solr exception
>
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
> |
>
> |org.apache.solr.common.SolrException: Document [null] missing required
> field: id |
>
> I have configured the |id| in all config-files. At
> |solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
> of solr I configured the id too. I don't know what's wrong. I add some
> logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
> add one loginfo at these line, when the read fields are added to
> SolrInputDocument. The result after building and running is:
>
> |2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
>
> |
>
> You will see only 3 read fields O_o. Does anyone know if there is
> something wrong in my configuration? *I need the running nutch really
> fast, because I am currently writing on my bachelor thesis :/* (on
> information integration of heterogenous data sources at the local
> network)
>
> Best regards
> marcel =)
>
> The rest of the log:
>
> |2010-09-11 21:31:06,079 INFO solr.SolrWriter - open()
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: content
> dest: content
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: site
> dest: site
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: title
> dest: metadata_title
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: host
> dest: host
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: segment
> dest: segment
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: boost
> dest: boost
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: digest
> dest: digest
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: tstamp
> dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source:
> lastModified dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - uniqueKey = id
> 2010-09-11 21:31:06,291 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: digest, value:
> 18abadd34a2bd71a8336fa5e8c6dbedb
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: digest, value:
> 3267fd5ea03852cdc83383635d133fad
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,311 INFO solr.SolrWriter - Key: digest, value:
> b61607602ab99eda5684adc9966349d6
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: segment, value:
> 20100911212851
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: digest, value:
> 9bdb8df3d1addf254203542dd22096d3
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,317 INFO solr.SolrWriter - Key: digest, value:
> 66eb3639ae15655bf91dc53208f95167
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: digest, value:
> 6e0501b52e204c2a68d9caa70dd0dfa9
> 2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
> 2010-09-11 21:31:06,687 WARN mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
> Document [null] missing required field: id
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:424)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:243)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
> actUpdateRequest.java:105)
> at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
> at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
> .java:48)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> |
>
>
Re: problem by integration of apache nutch (release 1.2) in apach
solr (trunk) - got solr exception
Posted by "h00kpublic@gmail.com" <h0...@googlemail.com>.
hm... i have found the fault in the nutch-site.xml. i forgot one
separation character "|" between values of plugin.includes :( ... but
why does the properties values not validate? i need one day the found my
user fault :/ ... after the nutch indexing works, i did a solr indexing
and get another exception:
java.lang.RuntimeException: Invalid version or the data in not in
'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at
org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-12 11:44:55,101 ERROR solr.SolrIndexer - java.io.IOException:
Job failed!
currently i check possible solutions and try to find out, why this
exception is thrown... any idea?
best regards marcel
On 09/11/2010 10:57 PM, Nemani, Raj wrote:
> Tyr to see If you can index into Nutch's native index. If you can, then
> inspect the NUtch index using Luke (latest version). Sorry I could not
> provide a direct answer but wanted to see if the generated crawl data
> has issues...
>
> -----Original Message-----
> From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
> Sent: Saturday, September 11, 2010 4:37 PM
> To: user@nutch.apache.org
> Subject: problem by integration of apache nutch (release 1.2) in apach
> solr (trunk) - got solr exception
>
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
> |
>
> |org.apache.solr.common.SolrException: Document [null] missing required
> field: id |
>
> I have configured the |id| in all config-files. At
> |solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
> of solr I configured the id too. I don't know what's wrong. I add some
> logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
> add one loginfo at these line, when the read fields are added to
> SolrInputDocument. The result after building and running is:
>
> |2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
>
> |
>
> You will see only 3 read fields O_o. Does anyone know if there is
> something wrong in my configuration? *I need the running nutch really
> fast, because I am currently writing on my bachelor thesis :/* (on
> information integration of heterogenous data sources at the local
> network)
>
> Best regards
> marcel =)
>
> The rest of the log:
>
> |2010-09-11 21:31:06,079 INFO solr.SolrWriter - open()
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: content
> dest: content
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: site
> dest: site
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: title
> dest: metadata_title
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: host
> dest: host
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: segment
> dest: segment
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: boost
> dest: boost
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: digest
> dest: digest
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: tstamp
> dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source:
> lastModified dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - uniqueKey = id
> 2010-09-11 21:31:06,291 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: digest, value:
> 18abadd34a2bd71a8336fa5e8c6dbedb
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: digest, value:
> 3267fd5ea03852cdc83383635d133fad
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,311 INFO solr.SolrWriter - Key: digest, value:
> b61607602ab99eda5684adc9966349d6
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: segment, value:
> 20100911212851
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: digest, value:
> 9bdb8df3d1addf254203542dd22096d3
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,317 INFO solr.SolrWriter - Key: digest, value:
> 66eb3639ae15655bf91dc53208f95167
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: digest, value:
> 6e0501b52e204c2a68d9caa70dd0dfa9
> 2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
> 2010-09-11 21:31:06,687 WARN mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
> Document [null] missing required field: id
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:424)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:243)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
> actUpdateRequest.java:105)
> at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
> at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
> .java:48)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> |
>
>
Re: problem by integration of apache nutch (release 1.2) in apach
solr (trunk) - got solr exception
Posted by "h00kpublic@gmail.com" <h0...@googlemail.com>.
the indexing into nutch-index works fine... if i open the created
index with luke, there are also only 3 fields (segment,boost and digest)
indexed ... next, i am trying another version of nutch... i hope it will
work today :/
best regards marcel
On 09/11/2010 10:57 PM, Nemani, Raj wrote:
> Tyr to see If you can index into Nutch's native index. If you can, then
> inspect the NUtch index using Luke (latest version). Sorry I could not
> provide a direct answer but wanted to see if the generated crawl data
> has issues...
>
> -----Original Message-----
> From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
> Sent: Saturday, September 11, 2010 4:37 PM
> To: user@nutch.apache.org
> Subject: problem by integration of apache nutch (release 1.2) in apach
> solr (trunk) - got solr exception
>
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
> |
>
> |org.apache.solr.common.SolrException: Document [null] missing required
> field: id |
>
> I have configured the |id| in all config-files. At
> |solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
> of solr I configured the id too. I don't know what's wrong. I add some
> logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
> add one loginfo at these line, when the read fields are added to
> SolrInputDocument. The result after building and running is:
>
> |2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
>
> |
>
> You will see only 3 read fields O_o. Does anyone know if there is
> something wrong in my configuration? *I need the running nutch really
> fast, because I am currently writing on my bachelor thesis :/* (on
> information integration of heterogenous data sources at the local
> network)
>
> Best regards
> marcel =)
>
> The rest of the log:
>
> |2010-09-11 21:31:06,079 INFO solr.SolrWriter - open()
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: content
> dest: content
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: site
> dest: site
> 2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: title
> dest: metadata_title
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: host
> dest: host
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: segment
> dest: segment
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: boost
> dest: boost
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: digest
> dest: digest
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: tstamp
> dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source:
> lastModified dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - uniqueKey = id
> 2010-09-11 21:31:06,291 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: digest, value:
> 18abadd34a2bd71a8336fa5e8c6dbedb
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: digest, value:
> 3267fd5ea03852cdc83383635d133fad
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,311 INFO solr.SolrWriter - Key: digest, value:
> b61607602ab99eda5684adc9966349d6
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: segment, value:
> 20100911212851
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: digest, value:
> 9bdb8df3d1addf254203542dd22096d3
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,317 INFO solr.SolrWriter - Key: digest, value:
> 66eb3639ae15655bf91dc53208f95167
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: digest, value:
> 6e0501b52e204c2a68d9caa70dd0dfa9
> 2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
> 2010-09-11 21:31:06,687 WARN mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
> Document [null] missing required field: id
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:424)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:243)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
> actUpdateRequest.java:105)
> at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
> at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
> .java:48)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> |
>
>
RE: problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception
Posted by "Nemani, Raj" <Ra...@turner.com>.
Tyr to see If you can index into Nutch's native index. If you can, then
inspect the NUtch index using Luke (latest version). Sorry I could not
provide a direct answer but wanted to see if the generated crawl data
has issues...
-----Original Message-----
From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
Sent: Saturday, September 11, 2010 4:37 PM
To: user@nutch.apache.org
Subject: problem by integration of apache nutch (release 1.2) in apach
solr (trunk) - got solr exception
hi...
I have configured the |solrindex-mapping.xml| (nutch) and configured my
solr |schema.xml| and |solrconfig.xml| too. Both working well on single
run, but if I use the |bin/nutch solrindex ...| I get an exception:|
|
|org.apache.solr.common.SolrException: Document [null] missing required
field: id |
I have configured the |id| in all config-files. At
|solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
of solr I configured the id too. I don't know what's wrong. I add some
logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
add one loginfo at these line, when the read fields are added to
SolrInputDocument. The result after building and running is:
|2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
|
You will see only 3 read fields O_o. Does anyone know if there is
something wrong in my configuration? *I need the running nutch really
fast, because I am currently writing on my bachelor thesis :/* (on
information integration of heterogenous data sources at the local
network)
Best regards
marcel =)
The rest of the log:
|2010-09-11 21:31:06,079 INFO solr.SolrWriter - open()
2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: content
dest: content
2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: site
dest: site
2010-09-11 21:31:06,280 INFO solr.SolrMappingReader - source: title
dest: metadata_title
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: host
dest: host
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: segment
dest: segment
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: boost
dest: boost
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: digest
dest: digest
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: tstamp
dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source:
lastModified dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
url
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
id
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
id
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - source: url dest:
url
2010-09-11 21:31:06,281 INFO solr.SolrMappingReader - uniqueKey = id
2010-09-11 21:31:06,291 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,294 INFO solr.SolrWriter - Key: digest, value:
18abadd34a2bd71a8336fa5e8c6dbedb
2010-09-11 21:31:06,306 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,306 INFO solr.SolrWriter - Key: digest, value:
3267fd5ea03852cdc83383635d133fad
2010-09-11 21:31:06,310 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,310 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,311 INFO solr.SolrWriter - Key: digest, value:
b61607602ab99eda5684adc9966349d6
2010-09-11 21:31:06,314 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: segment, value:
20100911212851
2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,314 INFO solr.SolrWriter - Key: digest, value:
9bdb8df3d1addf254203542dd22096d3
2010-09-11 21:31:06,316 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,316 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,317 INFO solr.SolrWriter - Key: digest, value:
66eb3639ae15655bf91dc53208f95167
2010-09-11 21:31:06,319 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,319 INFO solr.SolrWriter - Key: digest, value:
6e0501b52e204c2a68d9caa70dd0dfa9
2010-09-11 21:31:06,326 INFO solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO solr.SolrWriter - Key: digest, value:
bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO solr.SolrWriter - close()
2010-09-11 21:31:06,687 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Document [null] missing required
field: id
Document [null] missing required field: id
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
ttpSolrServer.java:424)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
ttpSolrServer.java:243)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
actUpdateRequest.java:105)
at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at
org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
Job failed!
|