You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2014/02/27 17:05:19 UTC
[jira] [Resolved] (SOLR-5786) MapReduceIndexerTool --help output is
missing large parts of the help text
[ https://issues.apache.org/jira/browse/SOLR-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Miller resolved SOLR-5786.
-------------------------------
Resolution: Duplicate
> MapReduceIndexerTool --help output is missing large parts of the help text
> --------------------------------------------------------------------------
>
> Key: SOLR-5786
> URL: https://issues.apache.org/jira/browse/SOLR-5786
> Project: Solr
> Issue Type: Bug
> Components: contrib - MapReduce
> Affects Versions: 4.7
> Reporter: wolfgang hoschek
> Assignee: Mark Miller
> Fix For: 4.8
>
>
> As already mentioned repeatedly and at length, this is a regression introduced by the fix in https://issues.apache.org/jira/browse/SOLR-5605
> Here is the diff of --help output before SOLR-5605 vs after SOLR-5605:
> {code}
> 130,235c130
> < lucene segments left in this index. Merging
> < segments involves reading and rewriting all data
> < in all these segment files, potentially multiple
> < times, which is very I/O intensive and time
> < consuming. However, an index with fewer segments
> < can later be merged faster, and it can later be
> < queried faster once deployed to a live Solr
> < serving shard. Set maxSegments to 1 to optimize
> < the index for low query latency. In a nutshell, a
> < small maxSegments value trades indexing latency
> < for subsequently improved query latency. This can
> < be a reasonable trade-off for batch indexing
> < systems. (default: 1)
> < --fair-scheduler-pool STRING
> < Optional tuning knob that indicates the name of
> < the fair scheduler pool to submit jobs to. The
> < Fair Scheduler is a pluggable MapReduce scheduler
> < that provides a way to share large clusters. Fair
> < scheduling is a method of assigning resources to
> < jobs such that all jobs get, on average, an equal
> < share of resources over time. When there is a
> < single job running, that job uses the entire
> < cluster. When other jobs are submitted, tasks
> < slots that free up are assigned to the new jobs,
> < so that each job gets roughly the same amount of
> < CPU time. Unlike the default Hadoop scheduler,
> < which forms a queue of jobs, this lets short jobs
> < finish in reasonable time while not starving long
> < jobs. It is also an easy way to share a cluster
> < between multiple of users. Fair sharing can also
> < work with job priorities - the priorities are
> < used as weights to determine the fraction of
> < total compute time that each job gets.
> < --dry-run Run in local mode and print documents to stdout
> < instead of loading them into Solr. This executes
> < the morphline in the client process (without
> < submitting a job to MR) for quicker turnaround
> < during early trial & debug sessions. (default:
> < false)
> < --log4j FILE Relative or absolute path to a log4j.properties
> < config file on the local file system. This file
> < will be uploaded to each MR task. Example:
> < /path/to/log4j.properties
> < --verbose, -v Turn on verbose output. (default: false)
> < --show-non-solr-cloud Also show options for Non-SolrCloud mode as part
> < of --help. (default: false)
> <
> < Required arguments:
> < --output-dir HDFS_URI HDFS directory to write Solr indexes to. Inside
> < there one output directory per shard will be
> < generated. Example: hdfs://c2202.mycompany.
> < com/user/$USER/test
> < --morphline-file FILE Relative or absolute path to a local config file
> < that contains one or more morphlines. The file
> < must be UTF-8 encoded. Example:
> < /path/to/morphline.conf
> <
> < Cluster arguments:
> < Arguments that provide information about your Solr cluster.
> <
> < --zk-host STRING The address of a ZooKeeper ensemble being used by
> < a SolrCloud cluster. This ZooKeeper ensemble will
> < be examined to determine the number of output
> < shards to create as well as the Solr URLs to
> < merge the output shards into when using the --go-
> < live option. Requires that you also pass the --
> < collection to merge the shards into.
> <
> < The --zk-host option implements the same
> < partitioning semantics as the standard SolrCloud
> < Near-Real-Time (NRT) API. This enables to mix
> < batch updates from MapReduce ingestion with
> < updates from standard Solr NRT ingestion on the
> < same SolrCloud cluster, using identical unique
> < document keys.
> <
> < Format is: a list of comma separated host:port
> < pairs, each corresponding to a zk server.
> < Example: '127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:
> < 2183' If the optional chroot suffix is used the
> < example would look like: '127.0.0.1:2181/solr,
> < 127.0.0.1:2182/solr,127.0.0.1:2183/solr' where
> < the client would be rooted at '/solr' and all
> < paths would be relative to this root - i.e.
> < getting/setting/etc... '/foo/bar' would result in
> < operations being run on '/solr/foo/bar' (from the
> < server perspective).
> <
> <
> < Go live arguments:
> < Arguments for merging the shards that are built into a live Solr
> < cluster. Also see the Cluster arguments.
> <
> < --go-live Allows you to optionally merge the final index
> < shards into a live Solr cluster after they are
> < built. You can pass the ZooKeeper address with --
> < zk-host and the relevant cluster information will
> < be auto detected. (default: false)
> < --collection STRING The SolrCloud collection to merge shards into
> < when using --go-live and --zk-host. Example:
> < collection1
> < --go-live-threads INTEGER
> < Tuning knob that indicates the maximum number of
> < live merges to run in parallel at one time.
> < (default: 1000)
> <
> ---
> >
> {code}
> As already mentioned repeatedly and at length, this bug is because there's a change related to buffer flushing in argparse4 >= 0.4.2.
> The fix is to apply CDH-16434 to MapReduceIndexerTool.java as follows:
> {code}
> - parser.printHelp(new PrintWriter(System.out));
> + parser.printHelp();
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org