You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@geode.apache.org by Evaristo José Camarero <ev...@yahoo.es> on 2017/06/14 10:12:08 UTC

PDX SERIALIZATION WITH C++ GEODE-NATIVE

Hi there,


 
Weare using Apache Geode 1.1 as DB backend and a C++ application as client thatis using Geode-native pre-release (and using PDX serialization), and thesummary is that we are experience that PDX deserialization is taking most ofthe CPU in our C++ application.


 
Whenbenchmarking the application we see the following (in a 12 cores machine):

-Geode CPU usage is really low (let's say 40%)

- Client app CPU usage is really (let's say 700%)


 
If we push further the benchmark thelatencies are simply too high


 
The objects that we store looks like:

         CacheableKeyPtrkey;



    char *a;



    bool bdistributedEntry;



   CacheableStringPtr c;



   CacheableHashMapPtr attributesMap;



    // children arekept only for distributed entries



   CacheableLinkedHashSetPtr d;











This method consumes most of the CPU:

CacheablePtrattributes;



    instance->getField("attributes", attributes);











I have some doubts about:

- Is C++ PDX serialization mature andperformant? I have read some benchmarks about Java's library but I was not ableto find information about the C++ library.

- Is there any of modelling guidelinesthat could alleviate the serialization performance issue? We havestarted with a design in which classes are designed for simplicity withoutconsider PDX serialization performance aspect





Thx in advance,





Evaristo


Re: PDX SERIALIZATION WITH C++ GEODE-NATIVE

Posted by Anthony Baker <ab...@pivotal.io>.
I would expect PDX to be more or less on par on par with protobuf performance, with the added benefits of server-side queries and function execution.  Can you share the code where you are doing the puts and gets?

Anthony

> On Jun 14, 2017, at 6:10 AM, Jacob Barrett <jb...@pivotal.io> wrote:
> 
> Since there hasn't been any release of Geode native are you using what is on development branch or a specific commit? 
> 
> If you don't intend to do any server side queries or function execution you could write your own custom serialization optimized for your data structure. 
> 
> PDX on C++ is mature but there is inherent performance cost of converting data structures to and from serialized forms in a generic cross platform way. 
> 
> I assume you're using a profiling tool. Can you share the entire output from that tool?
> 
> -Jake
> 
> 
> Sent from my iPhone
> 
> On Jun 14, 2017, at 3:12 AM, Evaristo José Camarero <evaristo.camarero@yahoo.es <ma...@yahoo.es>> wrote:
> 
>> Hi there,
>>  
>> We are using Apache Geode 1.1 as DB backend and a C++ application as client that is using Geode-native pre-release (and using PDX serialization), and the summary is that we are experience that PDX deserialization is taking most of the CPU in our C++ application.
>>  
>> When benchmarking the application we see the following (in a 12 cores machine):
>> - Geode CPU usage is really low (let's say 40%)
>> - Client app CPU usage is really (let's say 700%)
>>  
>> If we push further the benchmark the latencies are simply too high
>>  
>> The objects that we store looks like:
>>           CacheableKeyPtr key;
>>     char *a;
>>     bool bdistributedEntry;
>>     CacheableStringPtr c;
>>     CacheableHashMapPtr attributesMap;
>>     // children are kept only for distributed entries
>>     CacheableLinkedHashSetPtr d;
>> 
>> 
>> 
>> 
>> This method consumes most of the CPU:
>> CacheablePtr attributes;
>>     instance->getField("attributes", attributes);
>> 
>> 
>> 
>> 
>> I have some doubts about:
>> - Is C++ PDX serialization mature and performant? I have read some benchmarks about Java's library but I was not able to find information about the C++ library.
>> - Is there any of modelling guidelines that could alleviate the serialization performance issue? We have started with a design in which classes are designed for simplicity without consider PDX serialization performance aspect
>> 
>> 
>> Thx in advance,
>> 
>> 
>> Evaristo


Re: PDX SERIALIZATION WITH C++ GEODE-NATIVE

Posted by Jacob Barrett <jb...@pivotal.io>.
Since there hasn't been any release of Geode native are you using what is on development branch or a specific commit? 

If you don't intend to do any server side queries or function execution you could write your own custom serialization optimized for your data structure. 

PDX on C++ is mature but there is inherent performance cost of converting data structures to and from serialized forms in a generic cross platform way. 

I assume you're using a profiling tool. Can you share the entire output from that tool?

-Jake


Sent from my iPhone

> On Jun 14, 2017, at 3:12 AM, Evaristo José Camarero <ev...@yahoo.es> wrote:
> 
> Hi there,
>  
> We are using Apache Geode 1.1 as DB backend and a C++ application as client that is using Geode-native pre-release (and using PDX serialization), and the summary is that we are experience that PDX deserialization is taking most of the CPU in our C++ application.
>  
> When benchmarking the application we see the following (in a 12 cores machine):
> - Geode CPU usage is really low (let's say 40%)
> - Client app CPU usage is really (let's say 700%)
>  
> If we push further the benchmark the latencies are simply too high
>  
> The objects that we store looks like:
>           CacheableKeyPtr key;
>     char *a;
>     bool bdistributedEntry;
>     CacheableStringPtr c;
>     CacheableHashMapPtr attributesMap;
>     // children are kept only for distributed entries
>     CacheableLinkedHashSetPtr d;
> 
> 
> 
> 
> This method consumes most of the CPU:
> CacheablePtr attributes;
>     instance->getField("attributes", attributes);
> 
> 
> 
> 
> I have some doubts about:
> - Is C++ PDX serialization mature and performant? I have read some benchmarks about Java's library but I was not able to find information about the C++ library.
> - Is there any of modelling guidelines that could alleviate the serialization performance issue? We have started with a design in which classes are designed for simplicity without consider PDX serialization performance aspect
> 
> 
> Thx in advance,
> 
> 
> Evaristo

Re: PDX SERIALIZATION WITH C++ GEODE-NATIVE

Posted by Hitesh Khamesra <hi...@yahoo.com>.
Hi:
At client, you can just use "LdapObjectPdx" object, which will be always in de-serialize form. Do you see any issue with it?
Thanks.Hitesh


      From: Evaristo José Camarero <ev...@yahoo.es>
 To: Hitesh Khamesra <hi...@yahoo.com>; "user@geode.apache.org" <us...@geode.apache.org> 
 Sent: Thursday, June 15, 2017 3:47 AM
 Subject: Re: PDX SERIALIZATION WITH C++ GEODE-NATIVE
   

Hi there,

We are currentlyusing pre-modernization releasehttps://github.com/apache/geode-native/releases/tag/pre-modernization  The function thatyou saw before consuming most of the CPU is built in our C++ app and basicallyis desirializaring a HashMap . As part of the Map value we store arrays, somaybe the structure is too heavy...  inlineCacheableHashMapPtr getAttributes(const PdxInstancePtr &instance) {   CacheablePtr attributes;   instance->getField("attributes", attributes);    returndynCast<CacheableHashMapPtr>(attributes);}  The classesimplementing the SErializable interface looks like  classLdapObjectPdx : public apache::geode::client::PdxSerializable {  private:   CacheableKeyPtr key;    char*a;    boolb;   CacheableStringPtr c;     CacheableHashMapPtr attributes;    //children are kept only for distributed entries   CacheableLinkedHashSetPtr children;...      We tried to builda Java client making the queries our C++ app is doing and we see also that CPUusage is really high compared with Geode servers...  Running the YSCBwe see also that YSCB CPU usage is quite high compared with Geode. Maybe thisis the expected behavior as far as the lookups are basically primary keylookups  We are thinkingseveral options: use Dataserialization instead of PDX, try some specific forour data structure, try different structure with more primitives and avoidcollections...  Any hint is highlyappreciated  Thanks again,  /evaristo  



    El Miércoles 14 de junio de 2017 19:11, Hitesh Khamesra <hi...@yahoo.com> escribió:
 

 >>>This method consumes most of the CPU:CacheablePtrattributes;    instance->getField("attributes", attributes);

Do you know who is calling getField(..) api? Is application using PdxInstance?
Thanks.Hitesh.
      From: Evaristo José Camarero <ev...@yahoo.es>
 To: "user@geode.apache.org" <us...@geode.apache.org> 
 Sent: Wednesday, June 14, 2017 3:12 AM
 Subject: PDX SERIALIZATION WITH C++ GEODE-NATIVE
  
Hi there,  Weare using Apache Geode 1.1 as DB backend and a C++ application as client thatis using Geode-native pre-release (and using PDX serialization), and thesummary is that we are experience that PDX deserialization is taking most ofthe CPU in our C++ application.  Whenbenchmarking the application we see the following (in a 12 cores machine):-Geode CPU usage is really low (let's say 40%)- Client app CPU usage is really (let's say 700%)  If we push further the benchmark thelatencies are simply too high  The objects that we store looks like:         CacheableKeyPtrkey;    char *a;    bool bdistributedEntry;   CacheableStringPtr c;   CacheableHashMapPtr attributesMap;    // children arekept only for distributed entries   CacheableLinkedHashSetPtr d;



This method consumes most of the CPU:CacheablePtrattributes;    instance->getField("attributes", attributes);



I have some doubts about:- Is C++ PDX serialization mature andperformant? I have read some benchmarks about Java's library but I was not ableto find information about the C++ library.- Is there any of modelling guidelinesthat could alleviate the serialization performance issue? We havestarted with a design in which classes are designed for simplicity withoutconsider PDX serialization performance aspect

Thx in advance,

Evaristo

   

   

   

Re: PDX SERIALIZATION WITH C++ GEODE-NATIVE

Posted by Evaristo José Camarero <ev...@yahoo.es>.
Hi there,





We are currentlyusing pre-modernization release

https://github.com/apache/geode-native/releases/tag/pre-modernization


 
The function thatyou saw before consuming most of the CPU is built in our C++ app and basicallyis desirializaring a HashMap . As part of the Map value we store arrays, somaybe the structure is too heavy...


 
inlineCacheableHashMapPtr getAttributes(const PdxInstancePtr &instance) {

   CacheablePtr attributes;

   instance->getField("attributes", attributes);

    returndynCast<CacheableHashMapPtr>(attributes);

}


 
The classesimplementing the SErializable interface looks like


 
classLdapObjectPdx : public apache::geode::client::PdxSerializable {

  private:

   CacheableKeyPtr key;

    char*a;

    boolb;

   CacheableStringPtr c;


 
   CacheableHashMapPtr attributes;

    //children are kept only for distributed entries

   CacheableLinkedHashSetPtr children;

...


 

 

 
We tried to builda Java client making the queries our C++ app is doing and we see also that CPUusage is really high compared with Geode servers...


 
Running the YSCBwe see also that YSCB CPU usage is quite high compared with Geode. Maybe thisis the expected behavior as far as the lookups are basically primary keylookups


 
We are thinkingseveral options: use Dataserialization instead of PDX, try some specific forour data structure, try different structure with more primitives and avoidcollections...


 
Any hint is highlyappreciated


 
Thanks again,


 
/evaristo


 




    El Miércoles 14 de junio de 2017 19:11, Hitesh Khamesra <hi...@yahoo.com> escribió:
 

 >>>This method consumes most of the CPU:CacheablePtrattributes;    instance->getField("attributes", attributes);

Do you know who is calling getField(..) api? Is application using PdxInstance?
Thanks.Hitesh.
      From: Evaristo José Camarero <ev...@yahoo.es>
 To: "user@geode.apache.org" <us...@geode.apache.org> 
 Sent: Wednesday, June 14, 2017 3:12 AM
 Subject: PDX SERIALIZATION WITH C++ GEODE-NATIVE
  
Hi there,  Weare using Apache Geode 1.1 as DB backend and a C++ application as client thatis using Geode-native pre-release (and using PDX serialization), and thesummary is that we are experience that PDX deserialization is taking most ofthe CPU in our C++ application.  Whenbenchmarking the application we see the following (in a 12 cores machine):-Geode CPU usage is really low (let's say 40%)- Client app CPU usage is really (let's say 700%)  If we push further the benchmark thelatencies are simply too high  The objects that we store looks like:         CacheableKeyPtrkey;    char *a;    bool bdistributedEntry;   CacheableStringPtr c;   CacheableHashMapPtr attributesMap;    // children arekept only for distributed entries   CacheableLinkedHashSetPtr d;



This method consumes most of the CPU:CacheablePtrattributes;    instance->getField("attributes", attributes);



I have some doubts about:- Is C++ PDX serialization mature andperformant? I have read some benchmarks about Java's library but I was not ableto find information about the C++ library.- Is there any of modelling guidelinesthat could alleviate the serialization performance issue? We havestarted with a design in which classes are designed for simplicity withoutconsider PDX serialization performance aspect

Thx in advance,

Evaristo

   

   

Re: PDX SERIALIZATION WITH C++ GEODE-NATIVE

Posted by Hitesh Khamesra <hi...@yahoo.com>.
>>>This method consumes most of the CPU:CacheablePtrattributes;    instance->getField("attributes", attributes);

Do you know who is calling getField(..) api? Is application using PdxInstance?
Thanks.Hitesh.
      From: Evaristo José Camarero <ev...@yahoo.es>
 To: "user@geode.apache.org" <us...@geode.apache.org> 
 Sent: Wednesday, June 14, 2017 3:12 AM
 Subject: PDX SERIALIZATION WITH C++ GEODE-NATIVE
   
Hi there,  Weare using Apache Geode 1.1 as DB backend and a C++ application as client thatis using Geode-native pre-release (and using PDX serialization), and thesummary is that we are experience that PDX deserialization is taking most ofthe CPU in our C++ application.  Whenbenchmarking the application we see the following (in a 12 cores machine):-Geode CPU usage is really low (let's say 40%)- Client app CPU usage is really (let's say 700%)  If we push further the benchmark thelatencies are simply too high  The objects that we store looks like:         CacheableKeyPtrkey;    char *a;    bool bdistributedEntry;   CacheableStringPtr c;   CacheableHashMapPtr attributesMap;    // children arekept only for distributed entries   CacheableLinkedHashSetPtr d;



This method consumes most of the CPU:CacheablePtrattributes;    instance->getField("attributes", attributes);



I have some doubts about:- Is C++ PDX serialization mature andperformant? I have read some benchmarks about Java's library but I was not ableto find information about the C++ library.- Is there any of modelling guidelinesthat could alleviate the serialization performance issue? We havestarted with a design in which classes are designed for simplicity withoutconsider PDX serialization performance aspect

Thx in advance,

Evaristo