You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by Eldon Carman <ec...@ucr.edu> on 2014/07/16 23:51:47 UTC

Disk and CPU not fully utilized

Vinayak,

Any ideas how to better utilize the disk and cpu? The system does not scale
to four threads when the data exceeds local memory. The query performance
is the same for both two and four threads. The results are the same when
using one or two disks.

We are utilizing a system that has one drive and four physical cores. The
specs on the drive show it has an average read/write of 156 MB/s. I set up
a few test to show how different processes make use of the drive.

Base test case with linux's dd command to find the read speed.
 - 160 MB/s with 20% cpu utilization

Next, I wrote a slimmed down version of our XML parser that reads the file
and parses the XML without saving the output.
 - 35 MB/s disk average, 122 MB/s disk max, 5 % cpu utilization, 1 thread
 - 30 MB/s disk average, 107 MB/s disk max, 7 % cpu utilization, 2 threads
 - 31 MB/s disk average, 102 MB/s disk max, 7 % cpu utilization, 4 threads

These are for an XQuery that parses and produces the XDM instance, but does
nothing with the result:
 - 9 MB/s disk average, 35 MB/s disk max, 5 % cpu utilization, 1 thread
 - 10 MB/s disk average, 34 MB/s disk max, 7 % cpu utilization, 2 threads
 - 10 MB/s disk average, 35 MB/s disk max, 5 % cpu utilization, 4 threads

Finally, the numbers for a full query processed through VXQuery:
 - 8 MB/s disk average, 29 MB/s disk max, 5 % cpu utilization, 1 thread
 - 9 MB/s disk average, 29 MB/s disk max, 7 % cpu utilization, 2 threads
 - 9 MB/s disk average, 29 MB/s disk max, 7 % cpu utilization, 4 threads

I did notice slight improvement for 2 and 4 threads when adding a character
buffer of 32M. The parser already has a character buffer of 8000 by
default. Any ideas how to get better utilization of the disk and cpu? Would
Java NIO be better option to the standard IO library? It seems we could be
getting 5 times more out of the system.

Thanks,
Preston

Re: Disk and CPU not fully utilized

Posted by Eldon Carman <ec...@ucr.edu>.
I found that I posted bad numbers for the average values. It should be 3x
the value I have in the e-mail. The correction has helped me identify the
next test and will post more about it after the test completes.


On Wed, Jul 16, 2014 at 8:45 PM, Vinayak Borkar <vi...@gmail.com> wrote:

> Hi Preston,
>
>
> I am assuming that for each of the readings below, the disk rate and CPU
> usage is for the entire system rather than per thread. If that is the case,
> adding threads does not seem to help at all.
>
> How long does the test shown below run for? Is the disk throughput and CPU
> utilization being reported over a sizable length of time over which the
> test runs?
>
>
The test is for a single run after file system cache clear. The numbers are
gathered from the dstat command. Dstat will generate csv output based on
polling results during the test executions.


> Finally, given that you have a good micro-benchmark, using YourKit will
> show you where time is being spent for the various cases. For the times it
> appears that all computation is happening with equal concurrency regardless
> of the number of threads used. Can you check if there is contention on any
> monitors? YourKit should readily show you that information.
>
>
I will also look at the monitors in YourKit.


> Vinayak
>
>
>
> On 7/16/14, 2:51 PM, Eldon Carman wrote:
>
>> Vinayak,
>>
>> Any ideas how to better utilize the disk and cpu? The system does not
>> scale
>> to four threads when the data exceeds local memory. The query performance
>> is the same for both two and four threads. The results are the same when
>> using one or two disks.
>>
>> We are utilizing a system that has one drive and four physical cores. The
>> specs on the drive show it has an average read/write of 156 MB/s. I set up
>> a few test to show how different processes make use of the drive.
>>
>> Base test case with linux's dd command to find the read speed.
>>   - 160 MB/s with 20% cpu utilization
>>
>> Next, I wrote a slimmed down version of our XML parser that reads the file
>> and parses the XML without saving the output.
>>   - 35 MB/s disk average, 122 MB/s disk max, 5 % cpu utilization, 1 thread
>>   - 30 MB/s disk average, 107 MB/s disk max, 7 % cpu utilization, 2
>> threads
>>   - 31 MB/s disk average, 102 MB/s disk max, 7 % cpu utilization, 4
>> threads
>>
>> These are for an XQuery that parses and produces the XDM instance, but
>> does
>> nothing with the result:
>>   - 9 MB/s disk average, 35 MB/s disk max, 5 % cpu utilization, 1 thread
>>   - 10 MB/s disk average, 34 MB/s disk max, 7 % cpu utilization, 2 threads
>>   - 10 MB/s disk average, 35 MB/s disk max, 5 % cpu utilization, 4 threads
>>
>> Finally, the numbers for a full query processed through VXQuery:
>>   - 8 MB/s disk average, 29 MB/s disk max, 5 % cpu utilization, 1 thread
>>   - 9 MB/s disk average, 29 MB/s disk max, 7 % cpu utilization, 2 threads
>>   - 9 MB/s disk average, 29 MB/s disk max, 7 % cpu utilization, 4 threads
>>
>> I did notice slight improvement for 2 and 4 threads when adding a
>> character
>> buffer of 32M. The parser already has a character buffer of 8000 by
>> default. Any ideas how to get better utilization of the disk and cpu?
>> Would
>> Java NIO be better option to the standard IO library? It seems we could be
>> getting 5 times more out of the system.
>>
>> Thanks,
>> Preston
>>
>>
>

Re: Disk and CPU not fully utilized

Posted by Vinayak Borkar <vi...@gmail.com>.
Hi Preston,


I am assuming that for each of the readings below, the disk rate and CPU 
usage is for the entire system rather than per thread. If that is the 
case, adding threads does not seem to help at all.

How long does the test shown below run for? Is the disk throughput and 
CPU utilization being reported over a sizable length of time over which 
the test runs?

Finally, given that you have a good micro-benchmark, using YourKit will 
show you where time is being spent for the various cases. For the times 
it appears that all computation is happening with equal concurrency 
regardless of the number of threads used. Can you check if there is 
contention on any monitors? YourKit should readily show you that 
information.

Vinayak


On 7/16/14, 2:51 PM, Eldon Carman wrote:
> Vinayak,
>
> Any ideas how to better utilize the disk and cpu? The system does not scale
> to four threads when the data exceeds local memory. The query performance
> is the same for both two and four threads. The results are the same when
> using one or two disks.
>
> We are utilizing a system that has one drive and four physical cores. The
> specs on the drive show it has an average read/write of 156 MB/s. I set up
> a few test to show how different processes make use of the drive.
>
> Base test case with linux's dd command to find the read speed.
>   - 160 MB/s with 20% cpu utilization
>
> Next, I wrote a slimmed down version of our XML parser that reads the file
> and parses the XML without saving the output.
>   - 35 MB/s disk average, 122 MB/s disk max, 5 % cpu utilization, 1 thread
>   - 30 MB/s disk average, 107 MB/s disk max, 7 % cpu utilization, 2 threads
>   - 31 MB/s disk average, 102 MB/s disk max, 7 % cpu utilization, 4 threads
>
> These are for an XQuery that parses and produces the XDM instance, but does
> nothing with the result:
>   - 9 MB/s disk average, 35 MB/s disk max, 5 % cpu utilization, 1 thread
>   - 10 MB/s disk average, 34 MB/s disk max, 7 % cpu utilization, 2 threads
>   - 10 MB/s disk average, 35 MB/s disk max, 5 % cpu utilization, 4 threads
>
> Finally, the numbers for a full query processed through VXQuery:
>   - 8 MB/s disk average, 29 MB/s disk max, 5 % cpu utilization, 1 thread
>   - 9 MB/s disk average, 29 MB/s disk max, 7 % cpu utilization, 2 threads
>   - 9 MB/s disk average, 29 MB/s disk max, 7 % cpu utilization, 4 threads
>
> I did notice slight improvement for 2 and 4 threads when adding a character
> buffer of 32M. The parser already has a character buffer of 8000 by
> default. Any ideas how to get better utilization of the disk and cpu? Would
> Java NIO be better option to the standard IO library? It seems we could be
> getting 5 times more out of the system.
>
> Thanks,
> Preston
>