You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/08/04 00:26:05 UTC

[jira] [Comment Edited] (NUTCH-1486) Upgrade to Solr 4.10.2

    [ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652610#comment-14652610 ] 

Lewis John McGibbney edited comment on NUTCH-1486 at 8/3/15 10:25 PM:
----------------------------------------------------------------------

Patch for trunk. This patch touches a couple of places.
* corrects classes within log4j.properties to indexwriter for SolrWriter
* removes schema-solr4.xml and moves all required fields over to schema.xml
* removes the bastard additional dependencies from ivy/ivy.xml (cf. NUTCH-2056, NUTCH-2058) and adds them to the parsefilter-naivebayes. Also upgrades the Mahout and Lucene API's along with the accompanying dependencies to play nicely with Lucene and Solr 4.10.2. Finally implements the correct plugins.xml runtime dependencies for this plugin as well.
* Removes the transitive dependency for org.apache.httpcomponents httpcore and httpclient within index-geoip. These dependencies were leading to hellish classpath issues due to newer implementations being used elsewhere. Also upgrades index-geoip dependency to 2.3.1. Implements the correct plugin.xml runtime dependencies.
* Introduces some new properties within nutch-default.xml which enable us to choose between HttpSolrServer, CloudSolrServer, ConcurrentSolrServer or LBSolrServer. These have been documented within nutch-site.xml and also within the describe() function of SolrWriter.
* upgraded use of httpclient and httpcore across the board to >= 4.3.1 meaning that we avoid classpath issues when indexing and building custom plugins on top of Nutch which implement newer interfaces for these dependencies. 

[~asitang] can you please test out this patch along with the parsefilter-naivebayes? I want to confirm that it works similar/same to what you expect from your trained models.

@ everyone else, I've tested this indexing into Elasticsearch 1.5.0 and Apache Solr 4.10.2 and all is good. It would be very much appreciated if people could test before this patch diverges too much from trunk.


was (Author: lewismc):
Patch for trunk. This patch touches a couple of places.
* corrects classes within log4j.properties to indexwriter for SolrWriter
* removes schema-solr4.xml and moves all required fields over to schema.xml
* removes the bastard additional dependencies from ivy/ivy.xml (cf. NUTCH-2056, NUTCH-2058) and adds them to the parsefilter-naivebayes. Also upgrades the Mahout and Lucene API's along with the accompanying dependencies to play nicely with Lucene and Solr 4.10.2. Finally implements the correct plugins.xml runtime dependencies for this plugin as well.
* Removes the transitive dependency for org.apache.httpcomponents httpcore and httpclient within index-geoip. These dependencies were leading to hellish classpath issues due to newer implementations being used elsewhere. Also upgrades index-geoip dependency to 2.3.1. Implements the correct plugin.xml runtime dependencies.
* Introduces some new properties within nutch-default.xml which enable us to choose between HttpSolrServer, CloudSolrServer, ConcurrentSolrServer or LBSolrServer. These have been documented within nutch-site.xml and also within the describe() function of SolrWriter.
* upgraded use of httpclient and httpcore across the board to >= 4.3.1 meaning that we avoid classpath issues when indexing and building custom plugins on top of Nutch which implement newer interfaces for these dependencies. 

[~asitang] can you please test out this patch along with the parsefilter-naivebayes? I want to confirm that it works similar/same to what you expect from your trained models.

@ everyone else, I've tested this indexing into Elasticsearch 1.5.0 and Apache Solr 4.10.2 and all is good. It would be very much appreciated if people could test before this patch diverges too much from trunk.
* removed 
* 

> Upgrade to Solr 4.10.2
> ----------------------
>
>                 Key: NUTCH-1486
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1486
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.6, 2.1
>         Environment: Solr 4.0, Nutch trunk 1.6-SNAPSHOT & Probably 2.2-SNAPHOT
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: memex
>             Fix For: 1.11
>
>         Attachments: NUTCH-1486-1.8.patch, NUTCH-1486-1.9-trunk.patch, NUTCH-1486-2.x-v3.patch, NUTCH-1486-2.x.patch, NUTCH-1486-2.x.v2.patch, NUTCH-1486-nutchgora.patch, NUTCH-1486-trunk.patch, NUTCH-1486-trunk.v2.patch, NUTCH-1486-trunk.v3.patch, NUTCH-1486-trunkv4.patch
>
>
> When attempting to configure a 4 multicore 4.0 instance with Nutch schema-solr4.xml file, I get the following exceptions.
> This has been discussed previously. As I see it we have two options
> 1. Keep maintaining both schema options
> 2. Ditch the more complex schema-solr4.xml in favour of vanilla schema.xml
> Thoughts?
> {code}
> SEVERE: Unable to create core: collection4
> org.apache.solr.common.SolrException: Unable to use updateLog: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> 	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> 	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> 	at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> 	at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> 	at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> 	at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> 	at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> 	at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> 	at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> 	at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> 	at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> 	at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> 	at org.eclipse.jetty.start.Main.start(Main.java:602)
> 	at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> 	at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> 	... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> 	at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> 	... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Unable to use updateLog: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:721)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:566)
> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
> 	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
> 	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> 	at org.eclipse.jetty.servlet.FilterHolder.doStart(FilterHolder.java:114)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:754)
> 	at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:258)
> 	at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1221)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:699)
> 	at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:454)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:36)
> 	at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:183)
> 	at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:491)
> 	at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:138)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:142)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:53)
> 	at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:604)
> 	at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:535)
> 	at org.eclipse.jetty.util.Scanner.scan(Scanner.java:398)
> 	at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:332)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:118)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:552)
> 	at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:227)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.util.component.AggregateLifeCycle.doStart(AggregateLifeCycle.java:63)
> 	at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:53)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.doStart(HandlerWrapper.java:91)
> 	at org.eclipse.jetty.server.Server.doStart(Server.java:263)
> 	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:59)
> 	at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1215)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1138)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.eclipse.jetty.start.Main.invokeMain(Main.java:457)
> 	at org.eclipse.jetty.start.Main.start(Main.java:602)
> 	at org.eclipse.jetty.start.Main.main(Main.java:82)
> Caused by: org.apache.solr.common.SolrException: Unable to use updateLog: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:236)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:476)
> 	at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:544)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:705)
> 	... 45 more
> Caused by: org.apache.solr.common.SolrException: _version_field must exist in schema, using indexed="true" stored="true" and multiValued="false" (_version_ does not exist)
> 	at org.apache.solr.update.VersionInfo.getAndCheckVersionField(VersionInfo.java:57)
> 	at org.apache.solr.update.VersionInfo.<init>(VersionInfo.java:83)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> 	... 55 more
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: user.dir=/home/lewis/ASF/solr/example
> 01-Nov-2012 16:26:15 org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init() done
> 2012-11-01 16:26:15.228:INFO:oejs.AbstractConnector:Started SocketConnector@0.0.0.0:8983
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)