You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Gregory Chanan (JIRA)" <ji...@apache.org> on 2015/07/15 01:39:06 UTC

[jira] [Commented] (SOLR-7734) MapReduce Indexer can error when using collection

    [ https://issues.apache.org/jira/browse/SOLR-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627281#comment-14627281 ] 

Gregory Chanan commented on SOLR-7734:
--------------------------------------

{code}+import com.google.common.base.Charsets;{code}
This is necessary?

{code}+ "may be downloaded from this ZooKeeper ensemble."));{code}
It's "may" because you might have specified --use-zk-solrconfig.xml?  And you want to leave it vague because the help on --use-zk-solrconfig.xml is suppressed?  This seems more confusing to me than just specifying everything in the help.

{code}
+        if (!options.useZkSolrConfig) {
+          // replace downloaded solrconfig.xml with embedded one
+          InputStream source = MapReduceIndexerTool.class.getResourceAsStream("/solrconfig.indexer.xml");
+          FileOutputStream destination = new FileOutputStream(getSolrConfig(tmpSolrHomeDir));
+          ByteStreams.copy(source, destination);
+	  destination.close();
+	  source.close();
+        }
{code}
The spacing looks off here.  Maybe better to close everything in a finally as well.

{code}
+      <solr-jarify-filesets>
+        <fileset dir="src/resources" />
+      </solr-jarify-filesets>
{code}
When i try to run "ant jar" on the map-reduce contrib I get "solr/contrib/map-reduce/src/resources does not exist" -- did you mean for solrconfig.indexer.xml to be there?

{code}
+  <luceneMatchVersion>4.10.3</luceneMatchVersion>
{code}
Why the old version?  Should this be 6.0.0 for trunk, 5.something for branch_5x?  (I assume you want it in both, tell me if that's incorrect)

{code}
To enable dynamic schema REST APIs, use the following for <schemaFactory>:
+
+       <schemaFactory class="ManagedIndexSchemaFactory">
+         <bool name="mutable">true</bool>
+         <str name="managedSchemaResourceName">managed-schema</str>
+       </schemaFactory>
{code}
Does this work with managed  schemas?  What about if the resource name isn't the default?

{code}
+  <!-- JMX
+
+       This example enables JMX if and only if an existing MBeanServer
+       is found, use this if you want to configure JMX through JVM
+       parameters. Remove this to disable exposing Solr configuration
+       and statistics to JMX.
+
+       For more details see http://wiki.apache.org/solr/SolrJmx
+    -->
+  <jmx />
{code}
Do we want jmx?  Is it even possible to use in an MR job?

{code}+  <requestDispatcher handleSelect="false" >
+    <!-- Request Parsing{code}
Do we need this whole section?

About testing: I assume the existing tests now use the new (non-overwrite behavior).  What about adding a test for the new option (--use-zk-solrconfig.xml).  Maybe something simple like have your own update chain that adds a field/value that you expect to see.  And possibly the converse, where you add an update.chain and check that the new behavior is actually working, i.e. that it doesn't use the solrconfig in zk.

> MapReduce Indexer can error when using collection
> -------------------------------------------------
>
>                 Key: SOLR-7734
>                 URL: https://issues.apache.org/jira/browse/SOLR-7734
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - MapReduce
>    Affects Versions: 5.2.1
>            Reporter: Mike Drob
>            Assignee: Gregory Chanan
>             Fix For: 5.3, Trunk
>
>         Attachments: SOLR-7734.patch, SOLR-7734.patch, SOLR-7734.patch
>
>
> When running the MapReduceIndexerTool, it will usually pull a {{solrconfig.xml}} from ZK for the collection that it is running against. This can be problematic for several reasons:
> * Performance: The configuration in ZK will likely have several query handlers, and lots of other components that don't make sense in an indexing-only use of EmbeddedSolrServer (ESS).
> * Classpath Resources: If the Solr services are using some kind of additional service (such as Sentry for auth) then the indexer will not have access to the necessary configurations without the user jumping through several hoops.
> * Distinct Configuration Needs: Enabling Soft Commits on the ESS doesn't make sense. There's other configurations that 
> * Update Chain Behaviours: I'm under the impression that UpdateChains may behave differently in ESS than a SolrCloud cluster. Is it safe to depend on consistent behaviour here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org