You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Anthony Baker (JIRA)" <ji...@apache.org> on 2018/04/20 19:58:10 UTC

[jira] [Closed] (GEODE-29) Fix all functional/behavioral differences between cache.xml and the public Java API.

     [ https://issues.apache.org/jira/browse/GEODE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anthony Baker closed GEODE-29.
------------------------------

> Fix all functional/behavioral differences between cache.xml and the public Java API.
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-29
>                 URL: https://issues.apache.org/jira/browse/GEODE-29
>             Project: Geode
>          Issue Type: Improvement
>          Components: configuration
>    Affects Versions: 1.0.0-incubating
>         Environment: Apache Geode configured either with cache.xml, public Java API or Gfsh (+Cluster Config, an extension of cache.xml).
>            Reporter: John Blum
>            Priority: Critical
>              Labels: ApacheGeode, CacheXML, PublicJavaAPI
>
> Certain _Apache Geode_ functions/behaviors are encapsulated in "internal" classes.  Therefore, when a developer initially uses {{cache.xml}} to configure _Geode_ and then (perhaps) switches to configuring a node programmatically using the public, Java API with seemingly equivalent and complimentary configuration logic certain things cease to "work as expected."
> For example...
> 1. Premature GatewayReceiver start before Region exists resulting in event/data loss issue:
> In {{cache.xml}}, if a developer defines a {{GatewayReceiver}} along with Regions that may potentially be updated by the {{GatewayReceiver}}, _Goede_ is careful not to "start" the {{GatewayReceiver}} until all the Regions have been created when processing (parsing and initializing _Geode_ components) the {{cache.xml}}.
> If _Geode_ were to start the {{GatewayReceiver}} "prematurely", and then events from the remote WAN site arrive before the Regions targeted by those events are created, then Geode will drop those events, thus causing data loss.  Therefore _Geode's_ logic when processing {{cache.xml}} prevents this from happening.
> However, if a developer uses the public, Java API to define the same configuration, no out-of-box protection is offered to prevent event (data) loss from happening, thus leaving application developers of the _Geode_ API to know how _Geode_ functions "internally".
> Fortunately, application developers are not completely left to fend for themselves and be purview to all the details.  Technologies, such as _Spring Data GemFire_, also consume and adhere to the _Geode_ public, Java API (and +only+ the "public" Java API; "internal" classes  are not used given they are subject to change), is able to handle this using Spring's robust bean container lifecycle management features.  However, other application consumers using the API will not fare as well.
> 2. Another problem stems from the poorly conceived and "imposed" ordering of persistent Regions.
> For instance, if I have 2 Members, each defining 2 persistent Regions, for which the Members are the "primary" for 1 of the 2 Regions and the 'other' Member hosts the redundant copy, like so...
> Member    Regions
> -------------------------
> X               B, A'
> Y               A, B'
> Tick (') -  indicates member (e.g. X) is the primary for a particular Region (i.e. A).
> Then, the system can result in a distributed deadlock due the non-apparent, non-arbitrary dependency between the Members caused by an improper configuration order of the Regions.
> In this situation, the primary Member for a Region must start before the Member hosting the redundant Region copy (secondary) because it is a property of _Geode" that the primary will have most recent, correct copy of the data.
> But, as I have illustrated above, when the system starts, and because I have defined the Regions in an improper (arbitrary) order, this system will deadlock.  I.e. when Member X starts, it will attempt to create Region B first.  However, Member X must wait for Member Y to start since Member Y is the "primary" for Region B.
> However, when Member Y starts, and because it tries to create Region A first, it too will wait on Member X hosting the "primary" copy of Region A thereby leading to a situation where each Member waits for the other and results in a distributed deadlock.
> This example is pretty scaled and get more complex as you add Members and additional Regions in a complex system.
> Of course, the "easy" solution is to ensure the Members in the cluster declaring the Region all define the Regions in their configuration in the "same order".  This is made even easier with the use of a cluster-wide, shared configuration using the Cluster Configuration Service).  So by defining all Regions in the same order on every Member (e.g. A followed by B), then a developer/user can avoid the distributed deadlock.
> However, it is naive for _Geode_ to assume users will know/conform to this restriction and impose an non-arbitrary order to workaround, basically, a technical limitation of the code.
> In other environments, such as Spring, you cannot necessarily guarantee what the order will be at runtime, especially if application components (e.g. DAO's) inject references to GemFire components (e.g. Regions) along with using in combination other advanced Spring container features like CLASSPATH component-scanning to wire up the entire application.
> Even "collocation" has an impact on the Region creation order since Spring must logically satisfy the "dependency" order of the beans first.  This is both logical and makes sense, where as Geode's ordering is non-arbitrary and non-apparent since any Member could host the redundant copy.  Therefore, this problem is an implementation detail leaked.
> Technically, the same problem can be reproduced in {{cache.xml}} for that matter with no Spring present.  And, this problem is especially more likely to happen using the public Java API since again, there is no special *magic* being handled by "internal" Geode classes (in this case) w.r.t. to {{cache.xml}}.  Users/developers just have to know the correct ordering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)