You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/13 19:46:58 UTC

[jira] [Comment Edited] (NUTCH-1568) port pluggable indexing architecture to 2.x

    [ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869818#comment-13869818 ] 

Lewis John McGibbney edited comment on NUTCH-1568 at 1/13/14 6:46 PM:
----------------------------------------------------------------------

Patch for 2.x HEAD. [~talat], changes from last patch include

 * reverts changes to org.apache.nutch.indexer.elastic package. We should deal with this exclusively within its own issue. Simply deleting the classes for the time being is not a good idea IMHO.
 * Upgrades all Solr dependencies to 4.6.0 (most recent) including the schema.xml, which now acts as the ONLY schema. Consequently we remove schema-solr4.xml
 * Begins to sort out possible classloading issues which may arise due to duplicate dependencies (with different revisions) included via the new indexer-solr plugin.xml file. If you take a look in to the generated runtime/local/plugin/indexer-solr/ directory, you will see duplicate versions of several libraries... this is both a waste of resources and potentially dangerous if we load wrong classes from wrong jar files.

I am having a major problem even running this patch with Solr 4.4.0, namely that when I invoke

./bin/nutch index -all -D solr.server.url=http://localhost:8983/solr/ 

I get the default logging message e.g.

java.lang.Exception: java.lang.RuntimeException: Missing SOLR URL. Should be set via -D solr.server.url
SOLRIndexWriter
	solr.server.url : URL of the SOLR instance (mandatory)
	solr.commit.size : buffer size when sending to SOLR (default 1000)
	solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
	solr.auth : use authentication (default false)
	solr.auth.username : use authentication (default false)
	solr.auth : username for authentication
	solr.auth.password : password for authentication 


was (Author: lewismc):
Patch for 2.x HEAD. [~talat], changes from last patch include

 * reverts changes to org.apache.nutch.indexer.elastic package. We should deal with this exclusively within its own issue. Simply deleting the classes for the time being is not a good idea IMHO.
 * Upgrades all Solr dependencies to 4.6.0 (most recent) including the schema.xml, which now acts as the ONLY schema. Consequently we remove schema-solr4.xml
 * Begins to sort out possibly classloading issues regarding the dependencies included in the new indexer-solr plugin.xml file. If you take a look in to the generated runtime/local/plugin/indexer-solr/ directory, you will see duplicate versions of several libraries... this is not good and a waste of resources.

I am having one problem even running this patch with Solr 4.4.0, namely that when I invoke

./bin/nutch index -all -D solr.server.url=http://localhost:8983/solr/ I get the default logging message e.g.

java.lang.Exception: java.lang.RuntimeException: Missing SOLR URL. Should be set via -D solr.server.url
SOLRIndexWriter
	solr.server.url : URL of the SOLR instance (mandatory)
	solr.commit.size : buffer size when sending to SOLR (default 1000)
	solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
	solr.auth : use authentication (default false)
	solr.auth.username : use authentication (default false)
	solr.auth : username for authentication
	solr.auth.password : password for authentication 

> port pluggable indexing architecture to 2.x
> -------------------------------------------
>
>                 Key: NUTCH-1568
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1568
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 2.2
>            Reporter: Lewis John McGibbney
>             Fix For: 2.3
>
>         Attachments: NUTCH-1568-v2.patch, NUTCH-1568-v3.path, NUTCH-1568-v4.patch, NUTCH-1568.patch
>
>
> I would like to port the work done by Julien on NUTCH-1047 to 2.x. This issue should track that. It would be nice to do the upgrade in NUTCH-1486 before we do the upgrade so that people can get using with solr 4.x ASAP.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)