You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Victor <vi...@gmail.com> on 2019/11/26 20:38:52 UTC

Improving Get operation performance

I am running some comparison tests (ignite vs cassandra) to check how to
improve the performance of 'get' operation. The data is fairly
straightforward. A simple Employee Object(10 odd fields), being stored as
BinaryObject in the cache as

IgniteCache<String, BinaryObject> empCache;

The cache is configured with, Write Sync Mode - FULL_SYNC, Atomicity -
TRANSACTIONAL, Backup - 1 & Persistence - Enabled

Cluster config, 3 server + 1 client node. Setup on 2 machine, server machine
(Intel(R) Xeon(R) CPU X5675  @ 3.07GHz) & client machine (Intel(R) Xeon(R)
CPU X5560  @ 2.80GHz).

Client has multiple threads(configurable) making concurrent 'get' calls.
Using 'get' on purpose due to use case requirements.

For about 500k request, i am getting a throughput of about 1500/sec. Given
all of the data is in off-heap with cache hits percentage = 100%.
Interestingly with Cassandra i am getting a similar performance, with key
Cache and limited row cache.
I've tried running with 10/20/30 threads, the performance is more/less same.

Letting the defaults for most of the Data configuration. For this test i
turned the persistence off. Ideally for get's it shouldn't really matter.
The performance is the same.

============================================
Data Regions Configured:
[19:35:58]   ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB,
persistence=false]

Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1,
state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]
============================================

Additionally ran top on both the machines to check if they are hitting the
resources,
------ Server
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14159 root      20   0   29.7g   3.2g  15216 S  10.3  4.5   1:35.69 java
14565 root      20   0   29.4g   2.9g  15224 S   8.3  4.2   1:33.41 java
13770 root      20   0   30.0g   2.9g  15184 S   6.3  4.2   1:36.99 java

----- Client
3731 root      20   0   27.8g   1.1g  15304 S 136.5  1.5   2:39.16 java

As you can see everything is well under.

Frankly, i was expecting Ignite gets to be pretty fast, given all data is in
cache. Atleast looking at this test 
https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
<https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing>  

Planning to run one more test tomorrow with no-persistence and setting near
cache (on heap) to see if it helps.

Let me know if you guys see any obvious configurations that should be set.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by ezhuravlev <e....@gmail.com>.
As you have 4 nodes on the same machine now, you have a lot of context
switching, probably all the nodes just competing for CPU resources with each
other.

Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Denis Magda <dm...@apache.org>.
Good to hear that you got to the root cause, Viktor!

Do you have any suggestions for extra performance/troubleshooting
tips/tricks that you had to learn hard way as long as the information was
not documented?

-
Denis


On Fri, Dec 6, 2019 at 4:11 AM Victor <vi...@gmail.com> wrote:

> Update,
>
> 1. So there were 2 issues, there was old batch processing app that
> periodically ran, that loaded lot of data in memory. Which i think was
> causing some memory contention. So i shut that down for me tests.
>
> 2. Thread dumps showed some odd wait times between 2 get calls. I had
> overtly complicated my client. which did a bunch of thing, so i commented
> out most of it, kept the load to thread distribution simple.
>
> With these changes, my get numbers looked good.
>
> for 10 threads i got about 30k/sec for 500k requests.
> for 30 threads i got about 71k/sec for 1M requests.
>
> Thanks for all the help with troubleshooting this.
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Improving Get operation performance

Posted by Victor <vi...@gmail.com>.
Update,

1. So there were 2 issues, there was old batch processing app that
periodically ran, that loaded lot of data in memory. Which i think was
causing some memory contention. So i shut that down for me tests.

2. Thread dumps showed some odd wait times between 2 get calls. I had
overtly complicated my client. which did a bunch of thing, so i commented
out most of it, kept the load to thread distribution simple.

With these changes, my get numbers looked good.

for 10 threads i got about 30k/sec for 500k requests.
for 30 threads i got about 71k/sec for 1M requests.

Thanks for all the help with troubleshooting this.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Not if there's enough parallelism, since nodes are not busy while requests
do round trips.

I recommend gathering jstack stack traces from all nodes, seeing what
threads are up to.

Regards,
-- 
Ilya Kasnacheev


чт, 28 нояб. 2019 г. в 21:58, Victor <vi...@gmail.com>:

> Not sure i follow. The data is on server node/s. Even for a single/multiple
> requests, 'get' from a client will need to make a n/w round trip if server
> and client are on different boxes vs both being on the same box. So n/w
> latency becomes quite relevant.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Improving Get operation performance

Posted by Victor <vi...@gmail.com>.
Not sure i follow. The data is on server node/s. Even for a single/multiple
requests, 'get' from a client will need to make a n/w round trip if server
and client are on different boxes vs both being on the same box. So n/w
latency becomes quite relevant.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I don't understand why the network hop is relevant here, if you are
(supposedly) running those gets in parallel.

Regards,
-- 
Ilya Kasnacheev


чт, 28 нояб. 2019 г. в 04:09, Victor <vi...@gmail.com>:

> Performed one more test. Moved the client on the same box, and changed the
> off & on heap values.
>
> The Employee record is barely about 75-100bytes. So 500k records would just
> range between 40-50mb + 1 backup, so another 40-50mb, so about 100mb worth
> of data.
>
> I set the off-heap to 1GB and -Xmx to 1GB as well. Here is what the
> topology
> looks like,
>
> Topology snapshot [ver=4, locNode=faaf52cf, servers=3, clients=1,
> state=ACTIVE, CPUs=24, offheap=4.0GB, heap=4.0GB]
>
> With 8GB cluster(on+off heap) swapping shouldn't really happen anymore
>
> Still the throughput is around 2000/s. Which i feel is largely due to no
> network hop. But this still is woefully slow, nowhere close to the
> benchmark
> numbers.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Improving Get operation performance

Posted by Victor <vi...@gmail.com>.
Performed one more test. Moved the client on the same box, and changed the
off & on heap values.

The Employee record is barely about 75-100bytes. So 500k records would just
range between 40-50mb + 1 backup, so another 40-50mb, so about 100mb worth
of data.

I set the off-heap to 1GB and -Xmx to 1GB as well. Here is what the topology
looks like,

Topology snapshot [ver=4, locNode=faaf52cf, servers=3, clients=1,
state=ACTIVE, CPUs=24, offheap=4.0GB, heap=4.0GB]

With 8GB cluster(on+off heap) swapping shouldn't really happen anymore

Still the throughput is around 2000/s. Which i feel is largely due to no
network hop. But this still is woefully slow, nowhere close to the benchmark
numbers.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Evgenii Zhuravlev <e....@gmail.com>.
Victor,

Then, I would recommend to check if you have a swapping enabled in OS. If
you have only 75gb on the machine and you started 3 nodes with 14 gb off
heap and something like a 16gb heap each, probably OS started a swapping
process, which will affect a performance.

Additionally, there is no need to run more than one Ignite node per
physical machine, you can use 3 smaller machine instead or start one
instance on this machine and give more memory to it.

Evgenii

ср, 27 нояб. 2019 г. в 11:33, Victor <vi...@gmail.com>:

> Yes, ran Cassandra on the same box. Similar config, 3 nodes on one box and
> client on another. Have about 75G on both boxes.
>
> However for now, i am keeping Cassandra aside, since my primary goal around
> evaluating Ignite is to see similar performance numbers for "get" as seen
> in
> the benchmark.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Improving Get operation performance

Posted by Victor <vi...@gmail.com>.
Yes, ran Cassandra on the same box. Similar config, 3 nodes on one box and
client on another. Have about 75G on both boxes.

However for now, i am keeping Cassandra aside, since my primary goal around
evaluating Ignite is to see similar performance numbers for "get" as seen in
the benchmark.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Evgenii Zhuravlev <e....@gmail.com>.
Hi Viktor,

It looks like you're running 3 server nodes on the same physical machine,
right? How do you run Cassandra benchmarks? Do you use the same 2 machines?
How many Cassandra instances do you have?

Also, how much memory do you have on this machine?

Best Regards,
Evgenii

ср, 27 нояб. 2019 г. в 10:29, Denis Magda <dm...@apache.org>:

> Hello Viktor,
>
> The benchmarks you're referring to are real and list all the configuration
> parameters as well as the source code. No cheating.
>
> The first catchy difference between your and those benchmarks is that
> you're using TRANSACTIONAL mode for Ignite. This involves a 2-phase-commit
> protocol making TRANSACTIONAL gets slower than ATOMIC gets. Plus, if there
> is a chance your benchmark queries similar keys in parallel then some of
> the Threads will be blocked until the locked keys are released. So, check
> for ATOMIC caches or, to make benchmark fair, use lightweight transactions
> of Cassandra.
>
> Also, I would look into the following areas:
>
>    - Share your Cassandra and Ignite configurations and the source code
>    for further analysis. Please also share your Ignite version.
>    - Ensure GC, JVM and OS are fine-tuned and don't affect the
>    performance:
>    https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/memory-tuning#java-heap-and-gc-tuning
>    - Collect GC logs and use Flight Recorder for both the client and
>    servers if the performance doesn't improve (it might be even a network
>    latency):
>    https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/troubleshooting#debugging-gc-issues
>    - Once the 3 servers cluster is fully optimized you might need to
>    scale to 4 or 5 to achieve 500k+ queries.
>
>
> -
> Denis
>
>
> On Tue, Nov 26, 2019 at 3:00 PM Victor <vi...@gmail.com> wrote:
>
>> It's 500k unique gets, spread across multiple threads. Max i tried with 30
>> threads.
>>
>> I cant use getAll for this usecase, since it is user driven and the user
>> will load one record at a time. In any case i expected event the single
>> gets
>> to be pretty fast as well. Given the benchmark reference -
>>
>> https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
>>
>> There too the code seems to be using a single get. But the throughput is
>> massive for 32 threads its about 120k. So now i am not sure if the numbers
>> listed are accurate or was the test done in a controlled setting with
>> additional configurations.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: Improving Get operation performance

Posted by Victor <vi...@gmail.com>.
Thanks Denis for confirming the benchmarks are real. 

I am using the latest ignite version i.e. 2.6.7.

I tried with Atomic as well, don't see much variation. Marginal changes. So
currently, in my test,

I am using
<ignite>/examples/config/persistentstore/example-persistent-store.xml, with
persistence disabled.
And starting my 3 server nodes simply via <ignite>/bin/ignite.sh
example-persistent-store.xml

Client uses the same config xml.

As for threads sharing the same key, no that is not a possibility. My daemon
thread iterates over all keys in a loop and every key is handed over to a
threadpool executor. So no 2 threads would get the same key.

For now, i am keeping Cassandra aside. Since my first goal is to atleast see
comparable performance numbers for "get", which is the primary reason to
evaluate Ignite.

After my initial tests, i had run perf test to check the network throughput
between the 2 boxes, and it was around 1GB/s.

So now as part of my next test, i am going to try moving my client to the
same box as the server, getting network related issue out of play and see if
it scales. Additionally try adding the applicable jvm properties you
suggested.

With this, the 2 primary reasons for performance dependencies are out,
network and disk. Everything should be in memory and on the same box.

I am not allocating any heap, and since this is primarily a 'get' as against
'query' test, we should be ok, i suppose. But let me know if heap allocation
is needed. The benchmark test did not mention that.

Lastly, here is the basic client code i am using,

// ======== configuration
Ignition.setClientMode(true);
		
ignite = Ignition.start("/home/example-persistent-store.xml");
ignite.cluster().active(true);
		
CacheConfiguration<String, BinaryObject> cacheConfig = new
CacheConfiguration<>("empCache");
cacheConfig.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheConfig.setBackups(1);
cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_ASYNC);
cacheConfig.setIndexedTypes(String.class, BinaryObject.class);
cacheConfig.setSqlSchema("PUBLIC");
cacheConfig.setStatisticsEnabled(true);
		
QueryEntity queryEntity = new QueryEntity();
queryEntity.setValueType("employee");
queryEntity.setKeyType(String.class.getName());
	    
LinkedHashMap<String, String> fields = new LinkedHashMap<>();
fields.put(Employee.FIELD_ID, String.class.getName());
fields.put(Employee.FIELD_NAME, String.class.getName());
fields.put(Employee.FIELD_DESIGNATION, String.class.getName());
fields.put(Employee.FIELD_EXPERIENCE, Integer.class.getName());
fields.put(Employee.FIELD_PHONE, Long.class.getName());
fields.put(Employee.FIELD_ISPERMANANT, Boolean.class.getName());
fields.put(Employee.FIELD_DEPARTMENTS, byte[].class.getName());
fields.put(Employee.FIELD_JOININGDATE, Timestamp.class.getName());
fields.put(Employee.FIELD_SALARY, Double.class.getName());

queryEntity.setFields(fields);
queryEntity.setIndexes(Arrays.asList(
	new QueryIndex(Employee.FIELD_ID),
	new QueryIndex(Employee.FIELD_NAME),
	new QueryIndex(Employee.FIELD_DESIGNATION),
	new QueryIndex(Employee.FIELD_EXPERIENCE),
	new QueryIndex(Employee.FIELD_PHONE),
	new QueryIndex(Employee.FIELD_JOININGDATE),
	new QueryIndex(Employee.FIELD_SALARY)
));
	    
cacheConfig.setQueryEntities(Arrays.asList(queryEntity));
empCache = ignite.getOrCreateCache(cacheConfig).withKeepBinary();

//=====================
// Get
public Employee get(UUID id) throws Exception {
        BinaryObject empBinary = empCache.get(id.toString());
	if (empBinary == null) System.out.println("Employee not found for Id[" +
id.toString() + "]");
	return retrieveEmployee(empBinary);
}



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Denis Magda <dm...@apache.org>.
Hello Viktor,

The benchmarks you're referring to are real and list all the configuration
parameters as well as the source code. No cheating.

The first catchy difference between your and those benchmarks is that
you're using TRANSACTIONAL mode for Ignite. This involves a 2-phase-commit
protocol making TRANSACTIONAL gets slower than ATOMIC gets. Plus, if there
is a chance your benchmark queries similar keys in parallel then some of
the Threads will be blocked until the locked keys are released. So, check
for ATOMIC caches or, to make benchmark fair, use lightweight transactions
of Cassandra.

Also, I would look into the following areas:

   - Share your Cassandra and Ignite configurations and the source code for
   further analysis. Please also share your Ignite version.
   - Ensure GC, JVM and OS are fine-tuned and don't affect the performance:
   https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/memory-tuning#java-heap-and-gc-tuning
   - Collect GC logs and use Flight Recorder for both the client and
   servers if the performance doesn't improve (it might be even a network
   latency):
   https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/troubleshooting#debugging-gc-issues
   - Once the 3 servers cluster is fully optimized you might need to scale
   to 4 or 5 to achieve 500k+ queries.


-
Denis


On Tue, Nov 26, 2019 at 3:00 PM Victor <vi...@gmail.com> wrote:

> It's 500k unique gets, spread across multiple threads. Max i tried with 30
> threads.
>
> I cant use getAll for this usecase, since it is user driven and the user
> will load one record at a time. In any case i expected event the single
> gets
> to be pretty fast as well. Given the benchmark reference -
>
> https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
>
> There too the code seems to be using a single get. But the throughput is
> massive for 32 threads its about 120k. So now i am not sure if the numbers
> listed are accurate or was the test done in a controlled setting with
> additional configurations.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Improving Get operation performance

Posted by Victor <vi...@gmail.com>.
It's 500k unique gets, spread across multiple threads. Max i tried with 30
threads.

I cant use getAll for this usecase, since it is user driven and the user
will load one record at a time. In any case i expected event the single gets
to be pretty fast as well. Given the benchmark reference -
https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing

There too the code seems to be using a single get. But the throughput is
massive for 32 threads its about 120k. So now i am not sure if the numbers
listed are accurate or was the test done in a controlled setting with
additional configurations.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Improving Get operation performance

Posted by Mikael <mi...@telia.com>.
Hi!

The numbers sound very low, I run on hardware close to yours (3 nodes 
(X5660*5) and 1 client), and I get way more than 1500/sec, not sure how 
much, I will have to check, but as long as you do single get's there is 
not so much you can do, each get will be one roundtrip over the network, 
and with single get's latency can have a huge impact, I modified my code 
and most of the time I cache all get's over 100ms into a getAll and that 
makes a huge impact on performance.

Not that much to change in configuration, number of backups don't have 
much impact on reads (unless you do replicated of course)

I am not sure how the traffic works but if there is only one tcp 
connection to each node you will not have much use for more than 3 
threads I would think.

Did you read 500K unique entries or the same multiple times ?

Mikael

Den 2019-11-26 kl. 21:38, skrev Victor:
> I am running some comparison tests (ignite vs cassandra) to check how to
> improve the performance of 'get' operation. The data is fairly
> straightforward. A simple Employee Object(10 odd fields), being stored as
> BinaryObject in the cache as
>
> IgniteCache<String, BinaryObject> empCache;
>
> The cache is configured with, Write Sync Mode - FULL_SYNC, Atomicity -
> TRANSACTIONAL, Backup - 1 & Persistence - Enabled
>
> Cluster config, 3 server + 1 client node. Setup on 2 machine, server machine
> (Intel(R) Xeon(R) CPU X5675  @ 3.07GHz) & client machine (Intel(R) Xeon(R)
> CPU X5560  @ 2.80GHz).
>
> Client has multiple threads(configurable) making concurrent 'get' calls.
> Using 'get' on purpose due to use case requirements.
>
> For about 500k request, i am getting a throughput of about 1500/sec. Given
> all of the data is in off-heap with cache hits percentage = 100%.
> Interestingly with Cassandra i am getting a similar performance, with key
> Cache and limited row cache.
> I've tried running with 10/20/30 threads, the performance is more/less same.
>
> Letting the defaults for most of the Data configuration. For this test i
> turned the persistence off. Ideally for get's it shouldn't really matter.
> The performance is the same.
>
> ============================================
> Data Regions Configured:
> [19:35:58]   ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB,
> persistence=false]
>
> Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1,
> state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]
> ============================================
>
> Additionally ran top on both the machines to check if they are hitting the
> resources,
> ------ Server
> PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 14159 root      20   0   29.7g   3.2g  15216 S  10.3  4.5   1:35.69 java
> 14565 root      20   0   29.4g   2.9g  15224 S   8.3  4.2   1:33.41 java
> 13770 root      20   0   30.0g   2.9g  15184 S   6.3  4.2   1:36.99 java
>
> ----- Client
> 3731 root      20   0   27.8g   1.1g  15304 S 136.5  1.5   2:39.16 java
>
> As you can see everything is well under.
>
> Frankly, i was expecting Ignite gets to be pretty fast, given all data is in
> cache. Atleast looking at this test
> https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
> <https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing>
>
> Planning to run one more test tomorrow with no-persistence and setting near
> cache (on heap) to see if it helps.
>
> Let me know if you guys see any obvious configurations that should be set.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>