You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Douglas Rapp <do...@gmail.com> on 2016/01/12 01:18:52 UTC

Problems using MapReduceIndexerTool with multiple reducers

Hello,

I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
instance (so just a single shard - not very cloud-like..).

I have been experimenting using the MapReduceIndexerTool to handle batch
indexing of CSV files in HDFS. I got it working on a weaker single-node
Hadoop test system, so I have been trying to do some performance testing on
a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware. The
issue that I have come across is that the job will only finish successfully
if I specify a single reducer (using the "--reducers 1" option upon
invoking the tool).

If the tool is invoked without specifying a number for mappers/reducers, it
appears that it tries to utilize the maximum number available. In my case,
it tries to use 16 mappers and 6 reducers. I have tried specifying many
different combinations, and what I have found is that I can tweak the
number of mappers to just about anything, but reducers must stay at "1" or
else the job fails. Also explains why I never saw this pop up on the first
system - looking closer at it, it defaults to only 1 reducer there. If I
try to increase it, I get the same failure. When the job fails, I get the
following stack trace:

6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception running
child : org.kitesdk.morphline.api.MorphlineRuntimeException:
java.lang.IllegalStateException: No matching slice found! The slice seems
unavailable. docRouterClass: org.apache.solr.common.cloud.ImplicitDocRouter
        at
org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
        at
org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
        at
org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
        at
org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: No matching slice found! The
slice seems unavailable. docRouterClass:
org.apache.solr.common.cloud.ImplicitDocRouter
        at
org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
        at
org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
        at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at
org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
        at
org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
        at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
        at
org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
        at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
        at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
        at
org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
        at
org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
        at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
        at
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
        at
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
        at
org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
        ... 10 more

When I try searching online for "No matching slice found", the only results
I get back are of the source code.. I can't seem to find anything to lead
me in the right direction.

Looking at the MapReduceIndexerTool more closely, it says that when using
more than one reducer per output shard (so in my case, >1) it will utilize
the "mtree" merge algorithm to merge the results held among several
mini-shards. I'm guessing this might have something to do with it, but I
can't find any other information on how this might be further tweaked or
debugged.

I can provide any additional information (environment settings, config
files, etc) on request. Any help would be appreciated.

Thanks,
Doug

Re: Problems using MapReduceIndexerTool with multiple reducers

Posted by Douglas Rapp <do...@gmail.com>.
Great to know. Thank you very much for your assistance!

On Tue, Jan 12, 2016 at 10:34 AM, Erick Erickson <er...@gmail.com>
wrote:

> bq: Do you know, is using the API the
> recommended way of handling collections? As opposed to putting collection
> folders containing "core.properties" file and "conf" folders (containing
> "schema.xml" and "solrconfig.xml", etc) all in the Solr home location?
>
> Absolutely and certainly DO use the collections API to create
> collections. DO NOT just try to create individual cores at various
> places on your disk and hope that Solr does the right thing. Solr tries,
> but as you've already discovered there are edge cases.
>
> Ditto for the Admin API. You _can_ use it, but unless you get everything
> exactly correct you'll have problems.
>
> Unless you're at the end of all possibilities, use the Collecitons API
> every time.
>
> Best,
> Erick
>
> On Tue, Jan 12, 2016 at 10:30 AM, Douglas Rapp <do...@gmail.com>
> wrote:
> > As an update, I went ahead and used the Collection API and deleted the
> > existing one, and then recreated it (specifying the compositeId router),
> > and when I tried out MRIT, I didn't have any problems whatsoever with the
> > number of reducers (and was able to cut the indexing time by over
> half!!).
> > I'm guessing that the issue was not with the router, but rather with how
> > the collection was getting created. Do you know, is using the API the
> > recommended way of handling collections? As opposed to putting collection
> > folders containing "core.properties" file and "conf" folders (containing
> > "schema.xml" and "solrconfig.xml", etc) all in the Solr home location?
> >
> > Thanks,
> > Doug
> >
> >
> > On Tue, Jan 12, 2016 at 9:26 AM, Douglas Rapp <do...@gmail.com>
> wrote:
> >
> >> I'm actually not specifying any router, and assumed the "implicit" one
> was
> >> the default. The only resource I can find for setting the document
> router
> >> is when creating a new collection via the Collections API, which I am
> not
> >> using. What I do is define several options in the "solrconfig.xml" file,
> >> then sync the conf directory with ZooKeeper, specifying the collection
> >> name. Then, when I start up Solr, it grabs the config from ZooKeeper,
> >> creates the HDFS directories (if not already present), and sets up the
> >> collection automatically. At that point, I can use MRIT to generate the
> >> indexes. Is that improper? Is there a way to specify the document
> router in
> >> solrconfig.xml?
> >>
> >> Your other questions:
> >> 1) Yes, the indexes are hosted directly in HDFS. As are the input data
> >> files.
> >> 2) Yes, I am using the --go-live option
> >>
> >> Here is the syntax I am using:
> >>
> >> hadoop jar ../../lib/*.jar org.apache.solr.hadoop.MapReduceIndexerTool
> >> hdfs://mhats-hadoop-master:54310/data --morphline-file
> >> my-morphline-file.conf --output-dir
> >> hdfs://mhats-hadoop-master:54310/solr/staging --log4j
> ../log4j.properties
> >> --zk-host my-zk-host --collection my-collection --go-live
> >>
> >> Thanks,
> >> Doug
> >>
> >> On Mon, Jan 11, 2016 at 5:22 PM, Erick Erickson <
> erickerickson@gmail.com>
> >> wrote:
> >>
> >>> Hmm, it looks like you created your collection with the "implicit"
> >>> router. Does the same thing happen when you use the default
> >>> compositeId router?
> >>>
> >>> Note, this should be OK with either, this is just to gather more info.
> >>>
> >>> Other questions:
> >>> 1> Are you running MRIT over Solr indexes that are actually hosted on
> >>> HDFS?
> >>> 2> Are you using the --go-live option?
> >>>
> >>> Actually, can you show us the entire command you use to invoke MRIT?
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <do...@gmail.com>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I am using Solr 4.10.4 in SolrCloud mode, but so far with only a
> single
> >>> > instance (so just a single shard - not very cloud-like..).
> >>> >
> >>> > I have been experimenting using the MapReduceIndexerTool to handle
> batch
> >>> > indexing of CSV files in HDFS. I got it working on a weaker
> single-node
> >>> > Hadoop test system, so I have been trying to do some performance
> >>> testing on
> >>> > a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better
> hardware.
> >>> The
> >>> > issue that I have come across is that the job will only finish
> >>> successfully
> >>> > if I specify a single reducer (using the "--reducers 1" option upon
> >>> > invoking the tool).
> >>> >
> >>> > If the tool is invoked without specifying a number for
> >>> mappers/reducers, it
> >>> > appears that it tries to utilize the maximum number available. In my
> >>> case,
> >>> > it tries to use 16 mappers and 6 reducers. I have tried specifying
> many
> >>> > different combinations, and what I have found is that I can tweak the
> >>> > number of mappers to just about anything, but reducers must stay at
> "1"
> >>> or
> >>> > else the job fails. Also explains why I never saw this pop up on the
> >>> first
> >>> > system - looking closer at it, it defaults to only 1 reducer there.
> If I
> >>> > try to increase it, I get the same failure. When the job fails, I get
> >>> the
> >>> > following stack trace:
> >>> >
> >>> > 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception
> >>> running
> >>> > child : org.kitesdk.morphline.api.MorphlineRuntimeException:
> >>> > java.lang.IllegalStateException: No matching slice found! The slice
> >>> seems
> >>> > unavailable. docRouterClass:
> >>> org.apache.solr.common.cloud.ImplicitDocRouter
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
> >>> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> >>> >         at
> >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> >>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >>> >         at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> >>> >         at java.security.AccessController.doPrivileged(Native Method)
> >>> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> >>> >         at
> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> >>> > Caused by: java.lang.IllegalStateException: No matching slice found!
> The
> >>> > slice seems unavailable. docRouterClass:
> >>> > org.apache.solr.common.cloud.ImplicitDocRouter
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> >>> >         at
> >>> >
> >>>
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >>> >         at
> >>> >
> >>>
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
> >>> >         ... 10 more
> >>> >
> >>> > When I try searching online for "No matching slice found", the only
> >>> results
> >>> > I get back are of the source code.. I can't seem to find anything to
> >>> lead
> >>> > me in the right direction.
> >>> >
> >>> > Looking at the MapReduceIndexerTool more closely, it says that when
> >>> using
> >>> > more than one reducer per output shard (so in my case, >1) it will
> >>> utilize
> >>> > the "mtree" merge algorithm to merge the results held among several
> >>> > mini-shards. I'm guessing this might have something to do with it,
> but I
> >>> > can't find any other information on how this might be further
> tweaked or
> >>> > debugged.
> >>> >
> >>> > I can provide any additional information (environment settings,
> config
> >>> > files, etc) on request. Any help would be appreciated.
> >>> >
> >>> > Thanks,
> >>> > Doug
> >>>
> >>
> >>
>

Re: Problems using MapReduceIndexerTool with multiple reducers

Posted by Erick Erickson <er...@gmail.com>.
bq: Do you know, is using the API the
recommended way of handling collections? As opposed to putting collection
folders containing "core.properties" file and "conf" folders (containing
"schema.xml" and "solrconfig.xml", etc) all in the Solr home location?

Absolutely and certainly DO use the collections API to create
collections. DO NOT just try to create individual cores at various
places on your disk and hope that Solr does the right thing. Solr tries,
but as you've already discovered there are edge cases.

Ditto for the Admin API. You _can_ use it, but unless you get everything
exactly correct you'll have problems.

Unless you're at the end of all possibilities, use the Collecitons API
every time.

Best,
Erick

On Tue, Jan 12, 2016 at 10:30 AM, Douglas Rapp <do...@gmail.com> wrote:
> As an update, I went ahead and used the Collection API and deleted the
> existing one, and then recreated it (specifying the compositeId router),
> and when I tried out MRIT, I didn't have any problems whatsoever with the
> number of reducers (and was able to cut the indexing time by over half!!).
> I'm guessing that the issue was not with the router, but rather with how
> the collection was getting created. Do you know, is using the API the
> recommended way of handling collections? As opposed to putting collection
> folders containing "core.properties" file and "conf" folders (containing
> "schema.xml" and "solrconfig.xml", etc) all in the Solr home location?
>
> Thanks,
> Doug
>
>
> On Tue, Jan 12, 2016 at 9:26 AM, Douglas Rapp <do...@gmail.com> wrote:
>
>> I'm actually not specifying any router, and assumed the "implicit" one was
>> the default. The only resource I can find for setting the document router
>> is when creating a new collection via the Collections API, which I am not
>> using. What I do is define several options in the "solrconfig.xml" file,
>> then sync the conf directory with ZooKeeper, specifying the collection
>> name. Then, when I start up Solr, it grabs the config from ZooKeeper,
>> creates the HDFS directories (if not already present), and sets up the
>> collection automatically. At that point, I can use MRIT to generate the
>> indexes. Is that improper? Is there a way to specify the document router in
>> solrconfig.xml?
>>
>> Your other questions:
>> 1) Yes, the indexes are hosted directly in HDFS. As are the input data
>> files.
>> 2) Yes, I am using the --go-live option
>>
>> Here is the syntax I am using:
>>
>> hadoop jar ../../lib/*.jar org.apache.solr.hadoop.MapReduceIndexerTool
>> hdfs://mhats-hadoop-master:54310/data --morphline-file
>> my-morphline-file.conf --output-dir
>> hdfs://mhats-hadoop-master:54310/solr/staging --log4j ../log4j.properties
>> --zk-host my-zk-host --collection my-collection --go-live
>>
>> Thanks,
>> Doug
>>
>> On Mon, Jan 11, 2016 at 5:22 PM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> Hmm, it looks like you created your collection with the "implicit"
>>> router. Does the same thing happen when you use the default
>>> compositeId router?
>>>
>>> Note, this should be OK with either, this is just to gather more info.
>>>
>>> Other questions:
>>> 1> Are you running MRIT over Solr indexes that are actually hosted on
>>> HDFS?
>>> 2> Are you using the --go-live option?
>>>
>>> Actually, can you show us the entire command you use to invoke MRIT?
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <do...@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
>>> > instance (so just a single shard - not very cloud-like..).
>>> >
>>> > I have been experimenting using the MapReduceIndexerTool to handle batch
>>> > indexing of CSV files in HDFS. I got it working on a weaker single-node
>>> > Hadoop test system, so I have been trying to do some performance
>>> testing on
>>> > a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware.
>>> The
>>> > issue that I have come across is that the job will only finish
>>> successfully
>>> > if I specify a single reducer (using the "--reducers 1" option upon
>>> > invoking the tool).
>>> >
>>> > If the tool is invoked without specifying a number for
>>> mappers/reducers, it
>>> > appears that it tries to utilize the maximum number available. In my
>>> case,
>>> > it tries to use 16 mappers and 6 reducers. I have tried specifying many
>>> > different combinations, and what I have found is that I can tweak the
>>> > number of mappers to just about anything, but reducers must stay at "1"
>>> or
>>> > else the job fails. Also explains why I never saw this pop up on the
>>> first
>>> > system - looking closer at it, it defaults to only 1 reducer there. If I
>>> > try to increase it, I get the same failure. When the job fails, I get
>>> the
>>> > following stack trace:
>>> >
>>> > 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception
>>> running
>>> > child : org.kitesdk.morphline.api.MorphlineRuntimeException:
>>> > java.lang.IllegalStateException: No matching slice found! The slice
>>> seems
>>> > unavailable. docRouterClass:
>>> org.apache.solr.common.cloud.ImplicitDocRouter
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
>>> >         at
>>> >
>>> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
>>> >         at
>>> >
>>> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
>>> >         at
>>> >
>>> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
>>> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>> >         at
>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>>> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>>> >         at java.security.AccessController.doPrivileged(Native Method)
>>> >         at javax.security.auth.Subject.doAs(Subject.java:415)
>>> >         at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>> > Caused by: java.lang.IllegalStateException: No matching slice found! The
>>> > slice seems unavailable. docRouterClass:
>>> > org.apache.solr.common.cloud.ImplicitDocRouter
>>> >         at
>>> >
>>> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
>>> >         at
>>> >
>>> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
>>> >         at
>>> >
>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
>>> >         at
>>> >
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>> >         at
>>> >
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>> >         at
>>> >
>>> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
>>> >         at
>>> >
>>> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>>> >         at
>>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
>>> >         at
>>> >
>>> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>>> >         at
>>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
>>> >         at
>>> >
>>> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
>>> >         at
>>> >
>>> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
>>> >         at
>>> >
>>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>>> >         at
>>> >
>>> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
>>> >         ... 10 more
>>> >
>>> > When I try searching online for "No matching slice found", the only
>>> results
>>> > I get back are of the source code.. I can't seem to find anything to
>>> lead
>>> > me in the right direction.
>>> >
>>> > Looking at the MapReduceIndexerTool more closely, it says that when
>>> using
>>> > more than one reducer per output shard (so in my case, >1) it will
>>> utilize
>>> > the "mtree" merge algorithm to merge the results held among several
>>> > mini-shards. I'm guessing this might have something to do with it, but I
>>> > can't find any other information on how this might be further tweaked or
>>> > debugged.
>>> >
>>> > I can provide any additional information (environment settings, config
>>> > files, etc) on request. Any help would be appreciated.
>>> >
>>> > Thanks,
>>> > Doug
>>>
>>
>>

Re: Problems using MapReduceIndexerTool with multiple reducers

Posted by Douglas Rapp <do...@gmail.com>.
As an update, I went ahead and used the Collection API and deleted the
existing one, and then recreated it (specifying the compositeId router),
and when I tried out MRIT, I didn't have any problems whatsoever with the
number of reducers (and was able to cut the indexing time by over half!!).
I'm guessing that the issue was not with the router, but rather with how
the collection was getting created. Do you know, is using the API the
recommended way of handling collections? As opposed to putting collection
folders containing "core.properties" file and "conf" folders (containing
"schema.xml" and "solrconfig.xml", etc) all in the Solr home location?

Thanks,
Doug


On Tue, Jan 12, 2016 at 9:26 AM, Douglas Rapp <do...@gmail.com> wrote:

> I'm actually not specifying any router, and assumed the "implicit" one was
> the default. The only resource I can find for setting the document router
> is when creating a new collection via the Collections API, which I am not
> using. What I do is define several options in the "solrconfig.xml" file,
> then sync the conf directory with ZooKeeper, specifying the collection
> name. Then, when I start up Solr, it grabs the config from ZooKeeper,
> creates the HDFS directories (if not already present), and sets up the
> collection automatically. At that point, I can use MRIT to generate the
> indexes. Is that improper? Is there a way to specify the document router in
> solrconfig.xml?
>
> Your other questions:
> 1) Yes, the indexes are hosted directly in HDFS. As are the input data
> files.
> 2) Yes, I am using the --go-live option
>
> Here is the syntax I am using:
>
> hadoop jar ../../lib/*.jar org.apache.solr.hadoop.MapReduceIndexerTool
> hdfs://mhats-hadoop-master:54310/data --morphline-file
> my-morphline-file.conf --output-dir
> hdfs://mhats-hadoop-master:54310/solr/staging --log4j ../log4j.properties
> --zk-host my-zk-host --collection my-collection --go-live
>
> Thanks,
> Doug
>
> On Mon, Jan 11, 2016 at 5:22 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Hmm, it looks like you created your collection with the "implicit"
>> router. Does the same thing happen when you use the default
>> compositeId router?
>>
>> Note, this should be OK with either, this is just to gather more info.
>>
>> Other questions:
>> 1> Are you running MRIT over Solr indexes that are actually hosted on
>> HDFS?
>> 2> Are you using the --go-live option?
>>
>> Actually, can you show us the entire command you use to invoke MRIT?
>>
>> Best,
>> Erick
>>
>> On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <do...@gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
>> > instance (so just a single shard - not very cloud-like..).
>> >
>> > I have been experimenting using the MapReduceIndexerTool to handle batch
>> > indexing of CSV files in HDFS. I got it working on a weaker single-node
>> > Hadoop test system, so I have been trying to do some performance
>> testing on
>> > a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware.
>> The
>> > issue that I have come across is that the job will only finish
>> successfully
>> > if I specify a single reducer (using the "--reducers 1" option upon
>> > invoking the tool).
>> >
>> > If the tool is invoked without specifying a number for
>> mappers/reducers, it
>> > appears that it tries to utilize the maximum number available. In my
>> case,
>> > it tries to use 16 mappers and 6 reducers. I have tried specifying many
>> > different combinations, and what I have found is that I can tweak the
>> > number of mappers to just about anything, but reducers must stay at "1"
>> or
>> > else the job fails. Also explains why I never saw this pop up on the
>> first
>> > system - looking closer at it, it defaults to only 1 reducer there. If I
>> > try to increase it, I get the same failure. When the job fails, I get
>> the
>> > following stack trace:
>> >
>> > 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception
>> running
>> > child : org.kitesdk.morphline.api.MorphlineRuntimeException:
>> > java.lang.IllegalStateException: No matching slice found! The slice
>> seems
>> > unavailable. docRouterClass:
>> org.apache.solr.common.cloud.ImplicitDocRouter
>> >         at
>> >
>> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
>> >         at
>> >
>> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
>> >         at
>> >
>> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
>> >         at
>> >
>> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
>> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>> >         at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>> >         at java.security.AccessController.doPrivileged(Native Method)
>> >         at javax.security.auth.Subject.doAs(Subject.java:415)
>> >         at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>> > Caused by: java.lang.IllegalStateException: No matching slice found! The
>> > slice seems unavailable. docRouterClass:
>> > org.apache.solr.common.cloud.ImplicitDocRouter
>> >         at
>> >
>> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
>> >         at
>> >
>> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
>> >         at
>> >
>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
>> >         at
>> >
>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>> >         at
>> >
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>> >         at
>> >
>> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
>> >         at
>> >
>> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
>> >         at
>> >
>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>> >         at
>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
>> >         at
>> >
>> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
>> >         at
>> >
>> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
>> >         at
>> >
>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>> >         at
>> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
>> >         at
>> >
>> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
>> >         at
>> >
>> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
>> >         at
>> >
>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>> >         at
>> >
>> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
>> >         at
>> >
>> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>> >         at
>> >
>> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
>> >         ... 10 more
>> >
>> > When I try searching online for "No matching slice found", the only
>> results
>> > I get back are of the source code.. I can't seem to find anything to
>> lead
>> > me in the right direction.
>> >
>> > Looking at the MapReduceIndexerTool more closely, it says that when
>> using
>> > more than one reducer per output shard (so in my case, >1) it will
>> utilize
>> > the "mtree" merge algorithm to merge the results held among several
>> > mini-shards. I'm guessing this might have something to do with it, but I
>> > can't find any other information on how this might be further tweaked or
>> > debugged.
>> >
>> > I can provide any additional information (environment settings, config
>> > files, etc) on request. Any help would be appreciated.
>> >
>> > Thanks,
>> > Doug
>>
>
>

Re: Problems using MapReduceIndexerTool with multiple reducers

Posted by Douglas Rapp <do...@gmail.com>.
I'm actually not specifying any router, and assumed the "implicit" one was
the default. The only resource I can find for setting the document router
is when creating a new collection via the Collections API, which I am not
using. What I do is define several options in the "solrconfig.xml" file,
then sync the conf directory with ZooKeeper, specifying the collection
name. Then, when I start up Solr, it grabs the config from ZooKeeper,
creates the HDFS directories (if not already present), and sets up the
collection automatically. At that point, I can use MRIT to generate the
indexes. Is that improper? Is there a way to specify the document router in
solrconfig.xml?

Your other questions:
1) Yes, the indexes are hosted directly in HDFS. As are the input data
files.
2) Yes, I am using the --go-live option

Here is the syntax I am using:

hadoop jar ../../lib/*.jar org.apache.solr.hadoop.MapReduceIndexerTool
hdfs://mhats-hadoop-master:54310/data --morphline-file
my-morphline-file.conf --output-dir
hdfs://mhats-hadoop-master:54310/solr/staging --log4j ../log4j.properties
--zk-host my-zk-host --collection my-collection --go-live

Thanks,
Doug

On Mon, Jan 11, 2016 at 5:22 PM, Erick Erickson <er...@gmail.com>
wrote:

> Hmm, it looks like you created your collection with the "implicit"
> router. Does the same thing happen when you use the default
> compositeId router?
>
> Note, this should be OK with either, this is just to gather more info.
>
> Other questions:
> 1> Are you running MRIT over Solr indexes that are actually hosted on HDFS?
> 2> Are you using the --go-live option?
>
> Actually, can you show us the entire command you use to invoke MRIT?
>
> Best,
> Erick
>
> On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <do...@gmail.com> wrote:
> > Hello,
> >
> > I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
> > instance (so just a single shard - not very cloud-like..).
> >
> > I have been experimenting using the MapReduceIndexerTool to handle batch
> > indexing of CSV files in HDFS. I got it working on a weaker single-node
> > Hadoop test system, so I have been trying to do some performance testing
> on
> > a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware.
> The
> > issue that I have come across is that the job will only finish
> successfully
> > if I specify a single reducer (using the "--reducers 1" option upon
> > invoking the tool).
> >
> > If the tool is invoked without specifying a number for mappers/reducers,
> it
> > appears that it tries to utilize the maximum number available. In my
> case,
> > it tries to use 16 mappers and 6 reducers. I have tried specifying many
> > different combinations, and what I have found is that I can tweak the
> > number of mappers to just about anything, but reducers must stay at "1"
> or
> > else the job fails. Also explains why I never saw this pop up on the
> first
> > system - looking closer at it, it defaults to only 1 reducer there. If I
> > try to increase it, I get the same failure. When the job fails, I get the
> > following stack trace:
> >
> > 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception running
> > child : org.kitesdk.morphline.api.MorphlineRuntimeException:
> > java.lang.IllegalStateException: No matching slice found! The slice seems
> > unavailable. docRouterClass:
> org.apache.solr.common.cloud.ImplicitDocRouter
> >         at
> >
> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
> >         at
> >
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
> >         at
> >
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
> >         at
> >
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> > Caused by: java.lang.IllegalStateException: No matching slice found! The
> > slice seems unavailable. docRouterClass:
> > org.apache.solr.common.cloud.ImplicitDocRouter
> >         at
> >
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
> >         at
> >
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
> >         at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
> >         at
> >
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> >         at
> >
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> >         at
> >
> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
> >         at
> >
> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
> >         at
> >
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >         at
> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> >         at
> >
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> >         at
> >
> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
> >         at
> >
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >         at
> org.kitesdk.morphline.base.Connector.process(Connector.java:64)
> >         at
> >
> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
> >         at
> >
> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
> >         at
> >
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >         at
> >
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
> >         at
> >
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
> >         at
> >
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
> >         ... 10 more
> >
> > When I try searching online for "No matching slice found", the only
> results
> > I get back are of the source code.. I can't seem to find anything to lead
> > me in the right direction.
> >
> > Looking at the MapReduceIndexerTool more closely, it says that when using
> > more than one reducer per output shard (so in my case, >1) it will
> utilize
> > the "mtree" merge algorithm to merge the results held among several
> > mini-shards. I'm guessing this might have something to do with it, but I
> > can't find any other information on how this might be further tweaked or
> > debugged.
> >
> > I can provide any additional information (environment settings, config
> > files, etc) on request. Any help would be appreciated.
> >
> > Thanks,
> > Doug
>

Re: Problems using MapReduceIndexerTool with multiple reducers

Posted by Erick Erickson <er...@gmail.com>.
Hmm, it looks like you created your collection with the "implicit"
router. Does the same thing happen when you use the default
compositeId router?

Note, this should be OK with either, this is just to gather more info.

Other questions:
1> Are you running MRIT over Solr indexes that are actually hosted on HDFS?
2> Are you using the --go-live option?

Actually, can you show us the entire command you use to invoke MRIT?

Best,
Erick

On Mon, Jan 11, 2016 at 4:18 PM, Douglas Rapp <do...@gmail.com> wrote:
> Hello,
>
> I am using Solr 4.10.4 in SolrCloud mode, but so far with only a single
> instance (so just a single shard - not very cloud-like..).
>
> I have been experimenting using the MapReduceIndexerTool to handle batch
> indexing of CSV files in HDFS. I got it working on a weaker single-node
> Hadoop test system, so I have been trying to do some performance testing on
> a 4-node Hadoop cluster (1 NameNode, 3 DataNode) with better hardware. The
> issue that I have come across is that the job will only finish successfully
> if I specify a single reducer (using the "--reducers 1" option upon
> invoking the tool).
>
> If the tool is invoked without specifying a number for mappers/reducers, it
> appears that it tries to utilize the maximum number available. In my case,
> it tries to use 16 mappers and 6 reducers. I have tried specifying many
> different combinations, and what I have found is that I can tweak the
> number of mappers to just about anything, but reducers must stay at "1" or
> else the job fails. Also explains why I never saw this pop up on the first
> system - looking closer at it, it defaults to only 1 reducer there. If I
> try to increase it, I get the same failure. When the job fails, I get the
> following stack trace:
>
> 6602 [main] WARN  org.apache.hadoop.mapred.YarnChild  - Exception running
> child : org.kitesdk.morphline.api.MorphlineRuntimeException:
> java.lang.IllegalStateException: No matching slice found! The slice seems
> unavailable. docRouterClass: org.apache.solr.common.cloud.ImplicitDocRouter
>         at
> org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
>         at
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
>         at
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
>         at
> org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.IllegalStateException: No matching slice found! The
> slice seems unavailable. docRouterClass:
> org.apache.solr.common.cloud.ImplicitDocRouter
>         at
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:120)
>         at
> org.apache.solr.hadoop.SolrCloudPartitioner.getPartition(SolrCloudPartitioner.java:49)
>         at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>         at
> org.apache.solr.hadoop.morphline.MorphlineMapper$MyDocumentLoader.load(MorphlineMapper.java:138)
>         at
> org.apache.solr.morphlines.solr.LoadSolrBuilder$LoadSolr.doProcess(LoadSolrBuilder.java:129)
>         at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>         at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
>         at
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
>         at
> org.apache.solr.morphlines.solr.SanitizeUnknownSolrFieldsBuilder$SanitizeUnknownSolrFields.doProcess(SanitizeUnknownSolrFieldsBuilder.java:94)
>         at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>         at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
>         at
> org.kitesdk.morphline.stdio.ReadCSVBuilder$ReadCSV.doProcess(ReadCSVBuilder.java:124)
>         at
> org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:93)
>         at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>         at
> org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
>         at
> org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
>         at
> org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:201)
>         ... 10 more
>
> When I try searching online for "No matching slice found", the only results
> I get back are of the source code.. I can't seem to find anything to lead
> me in the right direction.
>
> Looking at the MapReduceIndexerTool more closely, it says that when using
> more than one reducer per output shard (so in my case, >1) it will utilize
> the "mtree" merge algorithm to merge the results held among several
> mini-shards. I'm guessing this might have something to do with it, but I
> can't find any other information on how this might be further tweaked or
> debugged.
>
> I can provide any additional information (environment settings, config
> files, etc) on request. Any help would be appreciated.
>
> Thanks,
> Doug