You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gabriel shen <xs...@gmail.com> on 2012/03/08 11:09:10 UTC

indexing cpu utilization

Hi,

I noticed that, sequential indexing on 1 solr core is only using 40% of our
8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
way to increase CPU utilization rate?

best regards,
shen

Re: indexing cpu utilization

Posted by Tanguy Moal <ta...@gmail.com>.
How are you sending documents to solr ?

If you push solr input documents via HTTP (which is what SolrJ does), 
you could increase CPU consumption (and therefor reduce indexing time) 
by sending your update requests asynchronously, using multiple updating 
threads, to your single solr core.

Somebody more familiar than me with the Update chain could probably tell 
you more, but I think each update request is treated inside a single 
thread on the server side.
If that's correct, then you can increase CPU consumption on your 
indexing host by adding more updating threads (to the client pushing 
documents to your solr core)

Also make sure you don't ask solr to commit your pending changes to solr 
index too frequently (on each add), but only when you want changes to be 
taken into account on the searching side.

I personnaly like to let solr do autoCommits, using a combo of max added 
documents and elapsed time conditions for the auto commit policy.

Considering indexing bottlenecks more generally, my experience in that 
field, is that indexing speed is usually bound to, in frequency order :
- source enumeration speed (especially if solr input documents are made 
out of complex joins on a remote DB)
- Network IO if performing remote indexing and the network link isn't 
adapted to amount of data running through it
- Disk IO if you commit very often and rely on commodity SATAs HDDs, or 
if another process is stressing the poor little device (keep the 150 
IOPS limit in mind for sata devices)
- CPU if were able to get rid of previous bottlenecks

- Memory isn't playing the same role in indexing speed than other 
factors, because from my point of view it would only be a limit if you 
perform complex analysis on many many fields, and if that becomes a 
problem, then it becomes easy to spot with JMX and JConsole because your 
JVM would then be performing many GCs, and the process's resident RAM 
usage will be close to whatever was set to -Xmx .

I don't know if I was really clear, all I can say is that increasing the 
number of clients pushing updates to solr in parrallel was the easiest 
for me to reduce the indexing time for large update batches.

Hope this helps,

--
Tanguy

Le 08/03/2012 11:48, gabriel shen a écrit :
> Our indexing process is to adding a bundle of solr documents(for example
> 5000) to solr each time, and we observed that before commiting(which might
> be io bounded) it uses less than half the CPU capacity constantly, which
> sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
> know how much it will affect CPU utilization, we have assigned 14gb to the
> solr tomcat server on a 32 gb linux machine.
>
> best regards,
> shen
> On Thu, Mar 8, 2012 at 11:27 AM, Gora Mohanty<go...@mimirtech.com>  wrote:
>
>> On 8 March 2012 15:39, gabriel shen<xs...@gmail.com>  wrote:
>>> Hi,
>>>
>>> I noticed that, sequential indexing on 1 solr core is only using 40% of
>> our
>>> 8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
>>> way to increase CPU utilization rate?
>> [...]
>>
>> This is an open-ended question which could be due to a
>> variety of things, and also depends on how you are indexing.
>> Your indexing process might be I/O bounded (quite possible),
>> or memory bounded, rather than CPU bounded.
>>
>> Regards,
>> Gora
>>


Re: indexing cpu utilization

Posted by Gora Mohanty <go...@mimirtech.com>.
On 8 March 2012 16:18, gabriel shen <xs...@gmail.com> wrote:
> Our indexing process is to adding a bundle of solr documents(for example
> 5000) to solr each time, and we observed that before commiting(which might
> be io bounded) it uses less than half the CPU capacity constantly, which
> sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
> know how much it will affect CPU utilization, we have assigned 14gb to the
> solr tomcat server on a 32 gb linux machine.
[...]

Are you hitting memory limits?

As Tanguy has already pointed out in nice detail, it probably
also does matter how you push documents to Solr, and how
often you commit.

In an apples-to-oranges comparison, we used to run a
large indexing task, but with only a single commit at the
end, while it sounds as if you are using smaller batches,
with more frequent commits. In our case, we could max
out CPU usage (well, we backed off at ~85% utilisation
on each core). Though we were fetching data over the
network, it was a relatively high-bandwidth internal connection,
and we were using DIH with multiple Solr cores.

Regards,
Gora

Re: indexing cpu utilization

Posted by gabriel shen <xs...@gmail.com>.
Our indexing process is to adding a bundle of solr documents(for example
5000) to solr each time, and we observed that before commiting(which might
be io bounded) it uses less than half the CPU capacity constantly, which
sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
know how much it will affect CPU utilization, we have assigned 14gb to the
solr tomcat server on a 32 gb linux machine.

best regards,
shen
On Thu, Mar 8, 2012 at 11:27 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On 8 March 2012 15:39, gabriel shen <xs...@gmail.com> wrote:
> > Hi,
> >
> > I noticed that, sequential indexing on 1 solr core is only using 40% of
> our
> > 8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
> > way to increase CPU utilization rate?
> [...]
>
> This is an open-ended question which could be due to a
> variety of things, and also depends on how you are indexing.
> Your indexing process might be I/O bounded (quite possible),
> or memory bounded, rather than CPU bounded.
>
> Regards,
> Gora
>

Re: indexing cpu utilization

Posted by Gora Mohanty <go...@mimirtech.com>.
On 8 March 2012 15:39, gabriel shen <xs...@gmail.com> wrote:
> Hi,
>
> I noticed that, sequential indexing on 1 solr core is only using 40% of our
> 8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
> way to increase CPU utilization rate?
[...]

This is an open-ended question which could be due to a
variety of things, and also depends on how you are indexing.
Your indexing process might be I/O bounded (quite possible),
or memory bounded, rather than CPU bounded.

Regards,
Gora

Re: indexing cpu utilization

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi Mark,

SOLR-3929 rocks!
A nigthly build of 4.1 with maxIndexingThreads configured to 24, takes 
80% to 100% of the cpu resources :-)

Thank you, Otis and Gora


"mpstat 10"
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>   0    0   0   13   607  241  234   78  100    2    1   258   87   2   0  11
>   1    0   0   24   240   23  293   94  109    3    1   286   86   1   0  13
>   2    0   0   12   367  181  268   83  102    4    1   338   89   1   0  10
>   3    0   0   18   188   20  226   67   86    5    1   243   87   1   0  13
>   4    0   0    5   205   22  255   74  100    4    1   310   87   1   0  12
>   5    0   0    5   192   22  228   68   88    4    0   260   89   1   0  10
>   6    0   0   15   223   23  278   86  104    5    1   319   87   1   0  12
>   7    0   0   18   215   23  267   75  104    5    1   321   85   1   0  14
>   8    0   0    4   253   21  272   64  112    4    0   284   77   1   0  22
>   9    0   0    4   243   20  281   61  108    3    0   300   79   1   0  20
>  10    0   0    2   234   22  272   56  111    4    0   376   78   1   0  21
>  11    0   0    2   205   18  237   57   96    4    0   297   82   1   0  17
>  12    0   0    3   251   24  273   59  113    4    0   323   72   1   0  27
>  13    0   0    4   203   19  236   54   91    2    0   294   82   1   0  17
>  14    0   0    4   245   21  288   54  111    3    0   309   77   1   0  22
>  15    0   0    4   233   21  258   58  106    3    0   280   80   1   0  19
>  16    0   0    5   286   19  346   60  133    4    0   425   73   1   0  26
>  17    0   0    6   340   23  414   67  151    4    0   500   67   1   0  31
>  18    0   0    7   343   23  435   67  150    5    0   482   66   2   0  32
>  19    0   0    8   294   19  348   53  128    5    0   444   70   1   0  29
>  20    0   0    6   309   21  385   64  139    4    0   514   68   1   0  31
>  21    0   0    7   279   20  378   58  133    3    0   471   69   1   0  30
>  22    0   0    6   249   18  329   50  120    4    0   469   72   1   0  27
>  23    0   0    6   258   20  338   54  127    3    0   388   70   1   0  28
>  24    0   0    6   400   20  608  146  187    4    0  1071   75   3   0  22
>  25    0   0    4   375   20  550  134  173    5    0   891   73   2   0  25
>  26    0   0    8   329   19  490  103  152    5    0   856   75   2   0  23
>  27    0   0    7   341   22  489  107  161    4    0   793   72   2   0  26
>  28    0   0    5   321   18  478   98  162    3    0   793   75   2   0  23
>  29    0   0    4   283   18  399   84  136    4    0   744   76   2   0  22
>  30    0   0    5   252   16  378   86  127    3    0   620   79   2   0  20
>  31    0   0    5   277   16  447   96  144    4    0   715   76   2   0  22


Re: indexing cpu utilization

Posted by Mark Miller <ma...@gmail.com>.
On Jan 3, 2013, at 5:40 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> "use more threads" vs. "use less threads"
> It is a bit confusing. 

My point was to make sure you are using more than one thread. With 32 cores, probably a lot more than one thread.

Otis' point was that you can also use too many threads.

Both are correct. Make sure you are using enough threads to satisfy the indexing power you have, but don't use too many - there are diminishing returns and then negative returns over a given threshold that will depend on your hardware and data.

Sorry about the wrong issue number.

As far as 32 cores being common, it's not common yet based on what I see. On average, I'm still seeing a lot of 4-8 core hardware. 32 or more has so far proven to be the exception when supporting Solr users. I'm sure that will continue to shift.

- Mark

Re: indexing cpu utilization

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi,
thank you for the hints.

>> On 3 January 2013 05:55, Mark Miller <ma...@gmail.com> wrote:
>>> 32 cores eh? You probably have to raise some limits to take advantage of
>>> that.
32 cores isn't that much anymore. You can buy amd servers from 
Supermicro with two sockets and 32G of ram for less than 2500$. Systems 
with four sockets (64Cores) aren't unaffordable too.
Having some more money, one can think about the four socket Oracle T4-4 
System (4 * 8cores * 8vcores = 256)

>>> You might always want to experiment with using more merge threads? I
>>> think the default may be 3.
I will try this. But i think otis is right. It's rather SOLR-3929 than 
SOLR-4078.
> Mark wanted to point this other issue:
> https://issues.apache.org/jira/browse/SOLR-3929 though, so try that...

Am 03.01.2013 05:20, schrieb Otis Gospodnetic:>
> I, too, was going to point out to the number of threads, but was going to
> suggest using fewer of them because the server has 32 cores and there was a
> mention of 100 threads being used from the client.  Thus, my guess was that
> the machine is busy juggling threads and context switching (how's vmstat 2
> output, Uwe?) instead of doing the real work.
"use more threads" vs. "use less threads"
It is a bit confusing. I made some test with 50 to 200 threads. within 
this range I noticed no real difference. 50 threads on the client seems 
to trigger enough threads on the server to saturate the bottleneck. 200 
client threads seems not to be destructive.

"vmstat 5" on the Server with 100 threads on the client
>  kthr      memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr cd cd s0 s4   in   sy   cs us sy id
>  0 0 0 13605380 17791928 0 7 0  0  0  0  0  0  0  0  0 3791 1638 1666 26  0 73
>  1 0 0 13641072 17826368 0 8 0  0  0  0  0  0  0  0  0 3540 1305 1527 25  0 74
>  0 0 0 13691908 17876364 0 8 0  0  0  0  0  0  0  0 48 3935 1453 1919 26  0 73
>  0 0 0 13720208 17904652 0 4 0  0  0  0  0  0  0  0  0 3964 1342 1645 25  0 74
>  0 0 0 13792440 17976868 0 9 0  0  0  0  0  0  0  0  0 3891 1551 1757 26  0 74
>  1 0 0 13867128 18051532 0 4 0  0  0  0  0  0  0  0  0 3871 1430 1584 26  0 74
>  1 0 0 13948796 18133184 0 6 0  0  0  0  0  0  0  0  0 3079 1218 1435 25  0 74

To see whats going on I prefer "mpstat 10" (100 client threads)
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>   0    0   0    2  1389 1149   55    6   15    8    0   173   62   4   0  34
>   1    0   0    5   230    2  341    5   34   15    0    24   16   1   0  83
>   2    0   0    0   653  612   48    6    5   18    0   222   68   5   0  27
>   3    0   0    1    31    1   38    1    7    7    0    21   13   1   0  87
>   4    0   0    8    39    2   45    2    6    6    0    38   17   1   0  82
>   5    0   0    6    76    3   84    4   18   14    0    51   35   2   0  64
>   6    0   0    4    30    0   40    4    6    7    0    59   32   1   0  67
>   7    0   0    1    36    0   53    3    6   12    0    65   34   1   0  65
>   8    0   0    4   107    4   91    3   31    4    0    25   19   0   0  80
>   9    0   0    4    70    4   66    3   17   10    0    38   27   1   0  72
>  10    0   0    1    58    2   56    4   13    8    0    47   34   1   0  65
>  11    0   0    1    36    3   46    2    5   13    0    20   14   0   0  86
>  12    0   0    0    32    2   37    3    5    8    0    30   20   0   0  80
>  13    0   0    1    40    2   48    3    7    9    0    37   25   1   0  74
>  14    0   0    2    77    3   85    4   18   16    0    42   35   1   0  64
>  15    0   0    2    29    4   27    2    6    3    0    23   15   0   0  85
>  16    0   0    3   110    2  100    2   33    8    0    24   14   1   0  85
>  17    0   0    3    66    2   69    3   17    9    0    40   27   1   0  72
>  18    0   0    1    42    1   54    4    9   11    0    58   32   1   0  67
>  19    0   0    1    54    1   60    1   13   11    0    16    7   0   0  93
>  20    0   0    0    26    0   39    1    3   11    0    22    9   0   0  91
>  21    0   0    0    33    0   46    3    6   11    0    50   30   1   0  69
>  22    0   0    3    38    1   44    4    8   12    0    50   33   1   0  66
>  23    0   0    3    60    1   61    2   15    8    0    29   18   0   0  82
>  24    0   0    4   102    2   92    3   31   10    0    95   31   1   0  68
>  25    0   0    2    75    1   76    4   21   10    0    47   36   1   0  63
>  26    0   0    1    68    1   81    5   19   18    0    69   47   1   0  52
>  27    0   0    1    40    1   52    3   10   14    0    25   22   1   0  77
>  28    0   0    0    35    0   38    3    9    6    0    34   24   0   0  76
>  29    0   0    1    31    0   46    4    7   13    0    44   31   1   0  68
>  30    0   0    0    32    0   48    4    8   13    0    47   37   1   0  62
>  31    0   0    0    26    0   36    3    7   10    0    50   32   1   0  67
No minor fails, no major fails, low crosscalls, reasonable interrupts, 
only some migrations... This seems for me quite good. Do you see a pitfall?

The advice "divide and conquer" from Gora, is probably the most 
sensible. But it isn't cool, or? ;-)

Uwe

Re: indexing cpu utilization

Posted by Otis Gospodnetic <ot...@gmail.com>.
I, too, was going to point out to the number of threads, but was going to
suggest using fewer of them because the server has 32 cores and there was a
mention of 100 threads being used from the client.  Thus, my guess was that
the machine is busy juggling threads and context switching (how's vmstat 2
output, Uwe?) instead of doing the real work.

Mark wanted to point this other issue:
https://issues.apache.org/jira/browse/SOLR-3929 though, so try that, too.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 2, 2013 at 11:13 PM, Gora Mohanty <go...@mimirtech.com> wrote:

> On 3 January 2013 05:55, Mark Miller <ma...@gmail.com> wrote:
> >
> > 32 cores eh? You probably have to raise some limits to take advantage of
> > that.
> >
> > https://issues.apache.org/jira/browse/SOLR-4078
> > support configuring IndexWriter max thread count in solrconfig
> >
> > That's coming in 4.1 and is likely important - the default is only 8.
> >
> > You might always want to experiment with using more merge threads? I
> think
> > the default may be 3.
> >
> > Beyond that, you may want to look at running multiple jvms on the one
> host
> > and doing distributed. That can certainly have benefits, but you have to
> > weigh against the management costs. And make sure process->processor
> > affinity is in gear.
> >
> > Finally, make sure you are using many threads to add docs...
> [...]
>
> Yes, making sure to use many threads is definitely good.
> We also found that indexing to multiple Solr cores, and
> doing one merge of all the indices at the end dramatically
> improved indexing time. As long as we had roughly one
> CPU core per Solr core (I am guessing that had to do
> with threading) indexing speed increased linearly with the
> number of Solr cores. Yes, the merge at the end is slow,
> and needs large disk space (at least twice the total index
> size), but one wins overall.
>
> Regards,
> Gora
>

Re: indexing cpu utilization

Posted by Gora Mohanty <go...@mimirtech.com>.
On 3 January 2013 05:55, Mark Miller <ma...@gmail.com> wrote:
>
> 32 cores eh? You probably have to raise some limits to take advantage of
> that.
>
> https://issues.apache.org/jira/browse/SOLR-4078
> support configuring IndexWriter max thread count in solrconfig
>
> That's coming in 4.1 and is likely important - the default is only 8.
>
> You might always want to experiment with using more merge threads? I think
> the default may be 3.
>
> Beyond that, you may want to look at running multiple jvms on the one host
> and doing distributed. That can certainly have benefits, but you have to
> weigh against the management costs. And make sure process->processor
> affinity is in gear.
>
> Finally, make sure you are using many threads to add docs...
[...]

Yes, making sure to use many threads is definitely good.
We also found that indexing to multiple Solr cores, and
doing one merge of all the indices at the end dramatically
improved indexing time. As long as we had roughly one
CPU core per Solr core (I am guessing that had to do
with threading) indexing speed increased linearly with the
number of Solr cores. Yes, the merge at the end is slow,
and needs large disk space (at least twice the total index
size), but one wins overall.

Regards,
Gora

Re: indexing cpu utilization

Posted by Mark Miller <ma...@gmail.com>.
32 cores eh? You probably have to raise some limits to take advantage of that.

https://issues.apache.org/jira/browse/SOLR-4078
support configuring IndexWriter max thread count in solrconfig

That's coming in 4.1 and is likely important - the default is only 8.

You might always want to experiment with using more merge threads? I think the default may be 3.

Beyond that, you may want to look at running multiple jvms on the one host and doing distributed. That can certainly have benefits, but you have to weigh against the management costs. And make sure process->processor affinity is in gear.

Finally, make sure you are using many threads to add docs...

- Mark

On Jan 2, 2013, at 4:39 PM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> Hi,
> 
> while trying to optimize our indexing workflow I reached the same endpoint like gabriel shen described in his mail. My Solr server won't utilize more than 40% of the computing power.
> I made some tests, but i'm not able to find the bottleneck. Could anybody help to solve this quest?
> 
> At first let me describe the environment:
> 
> Server:
> - Two socket Opteron (interlagos) => 32 cores
> - 64Gb Ram (1600Mhz)
> - SATA Disks: spindle and ssd
> - Solaris 5.11
> - JRE 1.7.0
> - Solr 4.0
> - ApplicationServer Jetty
> - 1Gb network interface
> 
> Client:
> - same hardware as client
> - either multi threaded solrj client using multiple instances of HttpSolrServer
> - or multi threaded solrj client using a ConcurrentUpdateSolrServer with 100 threads
> 
> Problem:
> - 10,000,000 docs of bibliographic data (~4k each)
> - with a simplified schema definition it takes 10 hours to index <=> ~250docs/second
> - with the real schema.xml it takes 50 hours to index  <=> ~50docs/second
> In both cases the client takes just 2% of the cpu resources and the server 35%. It's obvious that there is some optimization potential in the schema definition, but why uses the Server never more than 40% of the cpu power?
> 
> 
> Discarded possible bottlenecks:
> - Ram for the JVM
> Solr takes only up to 12G of heap and there is just a negligible gc activity. So the increase from 16G to 32G of possible heap made no difference.
> - Bandwidth of the net
> The transmitted data is identical in both cases. The size of the transmitted data is somewhat below 50G. Since both machines have a dedicated 1G line to the switch, the raw transmission should not take much more than 10 minutes
> - Performance of the client
> Like above, the client ist fast enough for the simplified case (10h). A dry run (just preprocessing not indexing) may finish after 75 minutes.
> - Servers disk IO
> The size of the simpler index is ~100G the size of the other is ~150G. This makes factor of 1.5 not 5. The difference between a ssd and a real disk is not noticeable. The output of 'iostat' and 'zpool iostat' is unsuspicious.
> - Bad thread distribution
> 'mpstat' shows a well distributed load over all cpus and a sensible amount of crosscalls (less than ten/cpu)
> - Solr update parameter (solrconfig.xml)
> Inspired from >http://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1 I'm using:
>> <ramBufferSizeMB>256</ramBufferSizeMB>
>> <mergeFactor>40</mergeFactor>
>> <termIndexInterval>1024</termIndexInterval>
>> <lockType>native</lockType>
>> <unlockOnStartup>true</unlockOnStartup>
> Any changes on this Parameters made it worse.
> 
> To get an idea whats going on, I've done some statistics with visualvm. (see attachement)
> The distribution of real and cpu time looks significant, but Im not smart enough to interpret the results.
> The method org.apache.lucene.index.treadAffinityDocumentsWriterThreadPool.getAndLock() is active at 80% of the time but takes only 1% of the cpu time. On the other hand the second method org.apache.commons.codec.language.bm.PhoneticEngine$PhonemeBuilder.append() is active at 12% of the time and is always running on a cpu
> 
> So again the question "When there are free resources in all dimensions, why utilizes Solr not more than 40% of the computing Power"?
> Bandwidth of the RAM?? I can't believe this. How to verify?
> ???
> 
> Any hints are welcome.
> Uwe
> 
> 
> 
> 
> 
> 


Re: indexing cpu utilization (attachement)

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 02.01.2013 22:39, schrieb Uwe Reh:
> To get an idea whats going on, I've done some statistics with visualvm.
> (see attachement)

"merde" the listserver stripes attachments.
You'll find the screen shot at 
 >http://fantasio.rz.uni-frankfurt.de/solrtest/HotSpot.gif

uwe


Re: indexing cpu utilization

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi,

while trying to optimize our indexing workflow I reached the same 
endpoint like gabriel shen described in his mail. My Solr server won't 
utilize more than 40% of the computing power.
I made some tests, but i'm not able to find the bottleneck. Could 
anybody help to solve this quest?

At first let me describe the environment:

Server:
- Two socket Opteron (interlagos) => 32 cores
- 64Gb Ram (1600Mhz)
- SATA Disks: spindle and ssd
- Solaris 5.11
- JRE 1.7.0
- Solr 4.0
- ApplicationServer Jetty
- 1Gb network interface

Client:
- same hardware as client
- either multi threaded solrj client using multiple instances of 
HttpSolrServer
- or multi threaded solrj client using a ConcurrentUpdateSolrServer with 
100 threads

Problem:
- 10,000,000 docs of bibliographic data (~4k each)
- with a simplified schema definition it takes 10 hours to index <=> 
~250docs/second
- with the real schema.xml it takes 50 hours to index  <=> ~50docs/second
In both cases the client takes just 2% of the cpu resources and the 
server 35%. It's obvious that there is some optimization potential in 
the schema definition, but why uses the Server never more than 40% of 
the cpu power?


Discarded possible bottlenecks:
- Ram for the JVM
Solr takes only up to 12G of heap and there is just a negligible gc 
activity. So the increase from 16G to 32G of possible heap made no 
difference.
- Bandwidth of the net
The transmitted data is identical in both cases. The size of the 
transmitted data is somewhat below 50G. Since both machines have a 
dedicated 1G line to the switch, the raw transmission should not take 
much more than 10 minutes
- Performance of the client
Like above, the client ist fast enough for the simplified case (10h). A 
dry run (just preprocessing not indexing) may finish after 75 minutes.
- Servers disk IO
The size of the simpler index is ~100G the size of the other is ~150G. 
This makes factor of 1.5 not 5. The difference between a ssd and a real 
disk is not noticeable. The output of 'iostat' and 'zpool iostat' is 
unsuspicious.
- Bad thread distribution
'mpstat' shows a well distributed load over all cpus and a sensible 
amount of crosscalls (less than ten/cpu)
- Solr update parameter (solrconfig.xml)
Inspired from 
 >http://www.hathitrust.org/blogs/large-scale-search/forty-days-and-forty-nights-re-indexing-7-million-books-part-1 I'm using:
> <ramBufferSizeMB>256</ramBufferSizeMB>
> <mergeFactor>40</mergeFactor>
> <termIndexInterval>1024</termIndexInterval>
> <lockType>native</lockType>
> <unlockOnStartup>true</unlockOnStartup>
Any changes on this Parameters made it worse.

To get an idea whats going on, I've done some statistics with visualvm. 
(see attachement)
The distribution of real and cpu time looks significant, but Im not 
smart enough to interpret the results.
The method 
org.apache.lucene.index.treadAffinityDocumentsWriterThreadPool.getAndLock() 
is active at 80% of the time but takes only 1% of the cpu time. On the 
other hand the second method 
org.apache.commons.codec.language.bm.PhoneticEngine$PhonemeBuilder.append() 
is active at 12% of the time and is always running on a cpu

So again the question "When there are free resources in all dimensions, 
why utilizes Solr not more than 40% of the computing Power"?
Bandwidth of the RAM?? I can't believe this. How to verify?
???

Any hints are welcome.
Uwe