You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "h00kpublic@gmail.com" <h0...@googlemail.com> on 2010/09/11 22:37:00 UTC

problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception


hi...
I have configured the |solrindex-mapping.xml| (nutch) and configured my 
solr |schema.xml| and |solrconfig.xml| too. Both working well on single 
run, but if I use the |bin/nutch solrindex ...| I get an exception:|
|

|org.apache.solr.common.SolrException: Document [null] missing required 
field: id |

I have configured the |id| in all config-files. At 
|solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml| 
of solr I configured the id too. I don't know what's wrong. I add some 
logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I 
add one loginfo at these line, when the read fields are added to 
SolrInputDocument. The result after building and running is:

|2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value: bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()

|

You will see only 3 read fields O_o. Does anyone know if there is 
something wrong in my configuration? *I need the running nutch really 
fast, because I am currently writing on my bachelor thesis :/* (on 
information integration of heterogenous data sources at the local network)

Best regards
marcel =)

The rest of the log:

|2010-09-11 21:31:06,079 INFO  solr.SolrWriter - open()
2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: content dest: content
2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: site dest: site
2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: title dest: metadata_title
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: host dest: host
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: segment dest: segment
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: boost dest: boost
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: digest dest: digest
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: tstamp dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: lastModified dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest: url
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest: id
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest: id
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest: url
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - uniqueKey = id
2010-09-11 21:31:06,291 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: digest, value: 18abadd34a2bd71a8336fa5e8c6dbedb
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: digest, value: 3267fd5ea03852cdc83383635d133fad
2010-09-11 21:31:06,310 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,311 INFO  solr.SolrWriter - Key: digest, value: b61607602ab99eda5684adc9966349d6
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: segment, value: 20100911212851
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: digest, value: 9bdb8df3d1addf254203542dd22096d3
2010-09-11 21:31:06,316 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,317 INFO  solr.SolrWriter - Key: digest, value: 66eb3639ae15655bf91dc53208f95167
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: digest, value: 6e0501b52e204c2a68d9caa70dd0dfa9
2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value: 20100911212934
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value: bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
2010-09-11 21:31:06,687 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Document [null] missing required field: id
Document [null] missing required field: id
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
         at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
         at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
         at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
         at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
         at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
         at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
         at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
|


Re: problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-09-11 22:37, h00kpublic@gmail.com wrote:
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|

Nutch 1.2 does NOT work with Solr trunk. You need to use the latest 
release of Solr 1.4.1.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception

Posted by "h00kpublic@gmail.com" <h0...@googlemail.com>.
  hm.. if i use the nutch indexing i got these exception:

2010-09-12 08:58:46,135 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
         at 
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:719)
         at 
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:724)
         at 
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2263)
         at 
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2249)
         at 
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2219)
         at 
org.apache.nutch.indexer.lucene.LuceneWriter.close(LuceneWriter.java:237)
         at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
         at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:482)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-12 08:58:46,264 ERROR indexer.Indexer - Indexer: 
java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
         at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
         at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
         at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)

is this a configuration fault by me or a nutch mistake? should i use 
another version (maybe release-1.1)?

best regards marcel :)

On 09/11/2010 10:57 PM, Nemani, Raj wrote:
> Tyr to see If you can index into Nutch's native index.  If you can, then
> inspect the NUtch index using Luke (latest version).   Sorry I could not
> provide a direct answer but wanted to see if the generated crawl data
> has issues...
>
> -----Original Message-----
> From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
> Sent: Saturday, September 11, 2010 4:37 PM
> To: user@nutch.apache.org
> Subject: problem by integration of apache nutch (release 1.2) in apach
> solr (trunk) - got solr exception
>
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
> |
>
> |org.apache.solr.common.SolrException: Document [null] missing required
> field: id |
>
> I have configured the |id| in all config-files. At
> |solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
> of solr I configured the id too. I don't know what's wrong. I add some
> logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
> add one loginfo at these line, when the read fields are added to
> SolrInputDocument. The result after building and running is:
>
> |2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
>
> |
>
> You will see only 3 read fields O_o. Does anyone know if there is
> something wrong in my configuration? *I need the running nutch really
> fast, because I am currently writing on my bachelor thesis :/* (on
> information integration of heterogenous data sources at the local
> network)
>
> Best regards
> marcel =)
>
> The rest of the log:
>
> |2010-09-11 21:31:06,079 INFO  solr.SolrWriter - open()
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: site
> dest: site
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: title
> dest: metadata_title
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: host
> dest: host
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: boost
> dest: boost
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: digest
> dest: digest
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: tstamp
> dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source:
> lastModified dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - uniqueKey = id
> 2010-09-11 21:31:06,291 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: digest, value:
> 18abadd34a2bd71a8336fa5e8c6dbedb
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: digest, value:
> 3267fd5ea03852cdc83383635d133fad
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,311 INFO  solr.SolrWriter - Key: digest, value:
> b61607602ab99eda5684adc9966349d6
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212851
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: digest, value:
> 9bdb8df3d1addf254203542dd22096d3
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,317 INFO  solr.SolrWriter - Key: digest, value:
> 66eb3639ae15655bf91dc53208f95167
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: digest, value:
> 6e0501b52e204c2a68d9caa70dd0dfa9
> 2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
> 2010-09-11 21:31:06,687 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
> Document [null] missing required field: id
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
>           at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:424)
>           at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:243)
>           at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
> actUpdateRequest.java:105)
>           at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>           at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
>           at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
> .java:48)
>           at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>           at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> |
>
>


Re: problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception

Posted by "h00kpublic@gmail.com" <h0...@googlemail.com>.
  hm... i have found the fault in the nutch-site.xml. i forgot one 
separation character "|" between values of plugin.includes :( ... but 
why does the properties values not validate? i need one day the found my 
user fault :/ ... after the nutch indexing works, i did a solr indexing 
and get another exception:

java.lang.RuntimeException: Invalid version or the data in not in 
'javabin' format
         at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
         at 
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
         at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466)
         at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
         at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
         at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
         at 
org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
         at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
         at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-12 11:44:55,101 ERROR solr.SolrIndexer - java.io.IOException: 
Job failed!

currently i check possible solutions and try to find out, why this 
exception is thrown... any idea?

best regards marcel

On 09/11/2010 10:57 PM, Nemani, Raj wrote:
> Tyr to see If you can index into Nutch's native index.  If you can, then
> inspect the NUtch index using Luke (latest version).   Sorry I could not
> provide a direct answer but wanted to see if the generated crawl data
> has issues...
>
> -----Original Message-----
> From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
> Sent: Saturday, September 11, 2010 4:37 PM
> To: user@nutch.apache.org
> Subject: problem by integration of apache nutch (release 1.2) in apach
> solr (trunk) - got solr exception
>
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
> |
>
> |org.apache.solr.common.SolrException: Document [null] missing required
> field: id |
>
> I have configured the |id| in all config-files. At
> |solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
> of solr I configured the id too. I don't know what's wrong. I add some
> logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
> add one loginfo at these line, when the read fields are added to
> SolrInputDocument. The result after building and running is:
>
> |2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
>
> |
>
> You will see only 3 read fields O_o. Does anyone know if there is
> something wrong in my configuration? *I need the running nutch really
> fast, because I am currently writing on my bachelor thesis :/* (on
> information integration of heterogenous data sources at the local
> network)
>
> Best regards
> marcel =)
>
> The rest of the log:
>
> |2010-09-11 21:31:06,079 INFO  solr.SolrWriter - open()
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: site
> dest: site
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: title
> dest: metadata_title
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: host
> dest: host
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: boost
> dest: boost
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: digest
> dest: digest
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: tstamp
> dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source:
> lastModified dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - uniqueKey = id
> 2010-09-11 21:31:06,291 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: digest, value:
> 18abadd34a2bd71a8336fa5e8c6dbedb
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: digest, value:
> 3267fd5ea03852cdc83383635d133fad
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,311 INFO  solr.SolrWriter - Key: digest, value:
> b61607602ab99eda5684adc9966349d6
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212851
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: digest, value:
> 9bdb8df3d1addf254203542dd22096d3
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,317 INFO  solr.SolrWriter - Key: digest, value:
> 66eb3639ae15655bf91dc53208f95167
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: digest, value:
> 6e0501b52e204c2a68d9caa70dd0dfa9
> 2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
> 2010-09-11 21:31:06,687 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
> Document [null] missing required field: id
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
>           at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:424)
>           at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:243)
>           at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
> actUpdateRequest.java:105)
>           at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>           at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
>           at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
> .java:48)
>           at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>           at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> |
>
>


Re: problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception

Posted by "h00kpublic@gmail.com" <h0...@googlemail.com>.
  the indexing into nutch-index works fine... if i open the created 
index with luke, there are also only 3 fields (segment,boost and digest) 
indexed ... next, i am trying another version of nutch... i hope it will 
work today :/

best regards marcel

On 09/11/2010 10:57 PM, Nemani, Raj wrote:
> Tyr to see If you can index into Nutch's native index.  If you can, then
> inspect the NUtch index using Luke (latest version).   Sorry I could not
> provide a direct answer but wanted to see if the generated crawl data
> has issues...
>
> -----Original Message-----
> From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com]
> Sent: Saturday, September 11, 2010 4:37 PM
> To: user@nutch.apache.org
> Subject: problem by integration of apache nutch (release 1.2) in apach
> solr (trunk) - got solr exception
>
>
>
> hi...
> I have configured the |solrindex-mapping.xml| (nutch) and configured my
> solr |schema.xml| and |solrconfig.xml| too. Both working well on single
> run, but if I use the |bin/nutch solrindex ...| I get an exception:|
> |
>
> |org.apache.solr.common.SolrException: Document [null] missing required
> field: id |
>
> I have configured the |id| in all config-files. At
> |solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml|
> of solr I configured the id too. I don't know what's wrong. I add some
> logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I
> add one loginfo at these line, when the read fields are added to
> SolrInputDocument. The result after building and running is:
>
> |2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
>
> |
>
> You will see only 3 read fields O_o. Does anyone know if there is
> something wrong in my configuration? *I need the running nutch really
> fast, because I am currently writing on my bachelor thesis :/* (on
> information integration of heterogenous data sources at the local
> network)
>
> Best regards
> marcel =)
>
> The rest of the log:
>
> |2010-09-11 21:31:06,079 INFO  solr.SolrWriter - open()
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: site
> dest: site
> 2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: title
> dest: metadata_title
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: host
> dest: host
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: boost
> dest: boost
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: digest
> dest: digest
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: tstamp
> dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source:
> lastModified dest: metadata_last_modified
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> id
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - uniqueKey = id
> 2010-09-11 21:31:06,291 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: digest, value:
> 18abadd34a2bd71a8336fa5e8c6dbedb
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: digest, value:
> 3267fd5ea03852cdc83383635d133fad
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,311 INFO  solr.SolrWriter - Key: digest, value:
> b61607602ab99eda5684adc9966349d6
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212851
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: digest, value:
> 9bdb8df3d1addf254203542dd22096d3
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,317 INFO  solr.SolrWriter - Key: digest, value:
> 66eb3639ae15655bf91dc53208f95167
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: digest, value:
> 6e0501b52e204c2a68d9caa70dd0dfa9
> 2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
> 20100911212934
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
> bc315927b7c01c7a2905d5b6872bc35b
> 2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
> 2010-09-11 21:31:06,687 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Document [null] missing required
> field: id
> Document [null] missing required field: id
> request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
>           at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:424)
>           at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
> ttpSolrServer.java:243)
>           at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
> actUpdateRequest.java:105)
>           at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
>           at
> org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
>           at
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
> .java:48)
>           at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>           at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
> Job failed!
> |
>
>


RE: problem by integration of apache nutch (release 1.2) in apach solr (trunk) - got solr exception

Posted by "Nemani, Raj" <Ra...@turner.com>.
Tyr to see If you can index into Nutch's native index.  If you can, then
inspect the NUtch index using Luke (latest version).   Sorry I could not
provide a direct answer but wanted to see if the generated crawl data
has issues...

-----Original Message-----
From: h00kpublic@gmail.com [mailto:h00kpublic@googlemail.com] 
Sent: Saturday, September 11, 2010 4:37 PM
To: user@nutch.apache.org
Subject: problem by integration of apache nutch (release 1.2) in apach
solr (trunk) - got solr exception



hi...
I have configured the |solrindex-mapping.xml| (nutch) and configured my 
solr |schema.xml| and |solrconfig.xml| too. Both working well on single 
run, but if I use the |bin/nutch solrindex ...| I get an exception:|
|

|org.apache.solr.common.SolrException: Document [null] missing required 
field: id |

I have configured the |id| in all config-files. At 
|solrindex-mapping.xml| it maps from |url| to |id| and at |schema.xml| 
of solr I configured the id too. I don't know what's wrong. I add some 
logging outputs into |org.apache.nutch.indexer.solr.SolrWriter.java|. I 
add one loginfo at these line, when the read fields are added to 
SolrInputDocument. The result after building and running is:

|2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()

|

You will see only 3 read fields O_o. Does anyone know if there is 
something wrong in my configuration? *I need the running nutch really 
fast, because I am currently writing on my bachelor thesis :/* (on 
information integration of heterogenous data sources at the local
network)

Best regards
marcel =)

The rest of the log:

|2010-09-11 21:31:06,079 INFO  solr.SolrWriter - open()
2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: content
dest: content
2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: site
dest: site
2010-09-11 21:31:06,280 INFO  solr.SolrMappingReader - source: title
dest: metadata_title
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: host
dest: host
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: segment
dest: segment
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: boost
dest: boost
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: digest
dest: digest
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: tstamp
dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source:
lastModified dest: metadata_last_modified
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
url
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
id
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
id
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - source: url dest:
url
2010-09-11 21:31:06,281 INFO  solr.SolrMappingReader - uniqueKey = id
2010-09-11 21:31:06,291 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,294 INFO  solr.SolrWriter - Key: digest, value:
18abadd34a2bd71a8336fa5e8c6dbedb
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,306 INFO  solr.SolrWriter - Key: digest, value:
3267fd5ea03852cdc83383635d133fad
2010-09-11 21:31:06,310 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,310 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,311 INFO  solr.SolrWriter - Key: digest, value:
b61607602ab99eda5684adc9966349d6
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: segment, value:
20100911212851
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,314 INFO  solr.SolrWriter - Key: digest, value:
9bdb8df3d1addf254203542dd22096d3
2010-09-11 21:31:06,316 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,316 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,317 INFO  solr.SolrWriter - Key: digest, value:
66eb3639ae15655bf91dc53208f95167
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,319 INFO  solr.SolrWriter - Key: digest, value:
6e0501b52e204c2a68d9caa70dd0dfa9
2010-09-11 21:31:06,326 INFO  solr.SolrWriter - write()
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: segment, value:
20100911212934
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: boost, value: 1.0
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - Key: digest, value:
bc315927b7c01c7a2905d5b6872bc35b
2010-09-11 21:31:06,327 INFO  solr.SolrWriter - close()
2010-09-11 21:31:06,687 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Document [null] missing required
field: id
Document [null] missing required field: id
request: http://127.0.0.1:8983/solr/update?wt=javabin&version=1
         at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
ttpSolrServer.java:424)
         at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH
ttpSolrServer.java:243)
         at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
actUpdateRequest.java:105)
         at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
         at
org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:98)
         at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat
.java:48)
         at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2010-09-11 21:31:07,556 ERROR solr.SolrIndexer - java.io.IOException:
Job failed!
|