You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Real Wes <Th...@outlook.com> on 2017/03/31 17:48:27 UTC

Global PDX Types -> Move to Region Level?

Would there be a negative impact in moving PDXType’s to the region level instead of the cache level?  On the positive side, when there is an object with a variable number of fields being stored in a partitioned region, we would get rid of the distributed lock.  As it is now at the cache level, a DLock is made, slowing performance.  Another positive is that export would go much, much, much faster in systems with a lot of PDXType’s because all PDXType’s are exported with every region.

Thanks,
Wes

Re: Global PDX Types -> Move to Region Level?

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Well said, Mike. We should document that as best practices.

As Anthony mentioned, we should look for fuzzy type checking, atleast for numbers. I did consider this some time back. But It is tedious to support backward compability, unless we say pdx version 2.  


      From: Anthony Baker <ab...@pivotal.io>
 To: dev@geode.apache.org 
 Sent: Wednesday, April 5, 2017 10:00 AM
 Subject: Re: Global PDX Types -> Move to Region Level?
   
Converting from schema-less (JSON) to schema-typed (PDX) is hard.  Perhaps we should explore fuzzy matching or type supersets to better JSON into Geode.

Anthony

> On Apr 5, 2017, at 9:42 AM, Michael Stolz <ms...@pivotal.io> wrote:
> 
> There's nothing to prevent putting the same object into multiple regions.
> The type catalog for a database shouldn't be relative to a table.
> It should be database wide.
> 
> The real problem in this case is the proliferation of PDX Types caused by
> using JSON documents that have only some of their fields populated, and
> have their fields in arbitrary orders. The documents in the particular case
> studied really are intended to all be the same type, but it's difficult to
> tell because they are structurally mutated in so many ways.
> 
> It might be possible to make Geode figure out that they are all the same
> type, but the real best-practice here is make sure you structure your JSON
> documents consistently so that Geode can know that they are all of the same
> type. Don't leave fields empty, default them to null or zero. Don't change
> the order of the fields.
> 
> This behavioral change will make the JSON documents as structured as the
> Java Objects representing the same data would be. Geode doesn't have
> PDXtype explosion with Java Objects.
> 
> 
> 
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: +1-631-835-4771
> 
> On Wed, Apr 5, 2017 at 5:51 AM, Olivier Mallassi <olivier.mallassi@gmail.com
>> wrote:
> 
>> I do not know about the negative impact but could  you elaborate about the
>> DLock?
>> 
>> thx.
>> 
>> On Fri, Mar 31, 2017 at 7:48 PM, Real Wes <Th...@outlook.com> wrote:
>> 
>>> 
>>> Would there be a negative impact in moving PDXType’s to the region level
>>> instead of the cache level?  On the positive side, when there is an
>> object
>>> with a variable number of fields being stored in a partitioned region, we
>>> would get rid of the distributed lock.  As it is now at the cache level,
>> a
>>> DLock is made, slowing performance.  Another positive is that export
>> would
>>> go much, much, much faster in systems with a lot of PDXType’s because all
>>> PDXType’s are exported with every region.
>>> 
>>> Thanks,
>>> Wes
>>> 
>>> 
>>

Re: Global PDX Types -> Move to Region Level?

Posted by Anthony Baker <ab...@pivotal.io>.

Converting from schema-less (JSON) to schema-typed (PDX) is hard.  Perhaps we should explore fuzzy matching or type supersets to better JSON into Geode.

Anthony

> On Apr 5, 2017, at 9:42 AM, Michael Stolz <ms...@pivotal.io> wrote:
> 
> There's nothing to prevent putting the same object into multiple regions.
> The type catalog for a database shouldn't be relative to a table.
> It should be database wide.
> 
> The real problem in this case is the proliferation of PDX Types caused by
> using JSON documents that have only some of their fields populated, and
> have their fields in arbitrary orders. The documents in the particular case
> studied really are intended to all be the same type, but it's difficult to
> tell because they are structurally mutated in so many ways.
> 
> It might be possible to make Geode figure out that they are all the same
> type, but the real best-practice here is make sure you structure your JSON
> documents consistently so that Geode can know that they are all of the same
> type. Don't leave fields empty, default them to null or zero. Don't change
> the order of the fields.
> 
> This behavioral change will make the JSON documents as structured as the
> Java Objects representing the same data would be. Geode doesn't have
> PDXtype explosion with Java Objects.
> 
> 
> 
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: +1-631-835-4771
> 
> On Wed, Apr 5, 2017 at 5:51 AM, Olivier Mallassi <olivier.mallassi@gmail.com
>> wrote:
> 
>> I do not know about the negative impact but could  you elaborate about the
>> DLock?
>> 
>> thx.
>> 
>> On Fri, Mar 31, 2017 at 7:48 PM, Real Wes <Th...@outlook.com> wrote:
>> 
>>> 
>>> Would there be a negative impact in moving PDXType’s to the region level
>>> instead of the cache level?  On the positive side, when there is an
>> object
>>> with a variable number of fields being stored in a partitioned region, we
>>> would get rid of the distributed lock.  As it is now at the cache level,
>> a
>>> DLock is made, slowing performance.  Another positive is that export
>> would
>>> go much, much, much faster in systems with a lot of PDXType’s because all
>>> PDXType’s are exported with every region.
>>> 
>>> Thanks,
>>> Wes
>>> 
>>> 
>>

Re: Global PDX Types -> Move to Region Level?

Posted by Michael Stolz <ms...@pivotal.io>.

There's nothing to prevent putting the same object into multiple regions.
The type catalog for a database shouldn't be relative to a table.
It should be database wide.

The real problem in this case is the proliferation of PDX Types caused by
using JSON documents that have only some of their fields populated, and
have their fields in arbitrary orders. The documents in the particular case
studied really are intended to all be the same type, but it's difficult to
tell because they are structurally mutated in so many ways.

It might be possible to make Geode figure out that they are all the same
type, but the real best-practice here is make sure you structure your JSON
documents consistently so that Geode can know that they are all of the same
type. Don't leave fields empty, default them to null or zero. Don't change
the order of the fields.

This behavioral change will make the JSON documents as structured as the
Java Objects representing the same data would be. Geode doesn't have
PDXtype explosion with Java Objects.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Wed, Apr 5, 2017 at 5:51 AM, Olivier Mallassi <olivier.mallassi@gmail.com
> wrote:

> I do not know about the negative impact but could  you elaborate about the
> DLock?
>
> thx.
>
> On Fri, Mar 31, 2017 at 7:48 PM, Real Wes <Th...@outlook.com> wrote:
>
> >
> > Would there be a negative impact in moving PDXType’s to the region level
> > instead of the cache level?  On the positive side, when there is an
> object
> > with a variable number of fields being stored in a partitioned region, we
> > would get rid of the distributed lock.  As it is now at the cache level,
> a
> > DLock is made, slowing performance.  Another positive is that export
> would
> > go much, much, much faster in systems with a lot of PDXType’s because all
> > PDXType’s are exported with every region.
> >
> > Thanks,
> > Wes
> >
> >
>

Re: Global PDX Types -> Move to Region Level?

Posted by Olivier Mallassi <ol...@gmail.com>.

I do not know about the negative impact but could  you elaborate about the
DLock?

thx.

On Fri, Mar 31, 2017 at 7:48 PM, Real Wes <Th...@outlook.com> wrote:

>
> Would there be a negative impact in moving PDXType’s to the region level
> instead of the cache level?  On the positive side, when there is an object
> with a variable number of fields being stored in a partitioned region, we
> would get rid of the distributed lock.  As it is now at the cache level, a
> DLock is made, slowing performance.  Another positive is that export would
> go much, much, much faster in systems with a lot of PDXType’s because all
> PDXType’s are exported with every region.
>
> Thanks,
> Wes
>
>