You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@geode.apache.org by GitBox <gi...@apache.org> on 2021/07/08 07:11:59 UTC

[GitHub] [geode] albertogpz edited a comment on pull request #6659: GEODE-9404: Do not log error message if sender is not configured.

albertogpz edited a comment on pull request #6659:
URL: https://github.com/apache/geode/pull/6659#issuecomment-876191044


   > This is what the
   
   
   
   > > > > > > > @pivotal-eshu
   > > > > > > > It looks like the Tx code is trying to assess Gateway sender requirement to support GEODE-7971 in TX code. Which doesn't seems right.
   > > > > > > > The "getLastTransactionEvent" should just return the last event in the transaction; currently it is doing it by "callbacks.get(callbacks.size() - 1);" which also seems to be error prone or inefficient.
   > > > > > > > In "firePendingCallbacks" it could have iterated over the list using size index:
   > > > > > > > for (int i=0; i<callbaack.size(); i++){
   > > > > > > > ee.getRegion().invokeTXCallbacks(EnumListenerEvent.AFTER_DESTROY, ee, true, (i+1) == size() /_isLastTrasaction_/);
   > > > > > > > }
   > > > > > > > Please correct me if i am missing anything.
   > > > > > > 
   > > > > > > 
   > > > > > > It was implemented that way to put the logic of getting the last transaction event in another class although it is as simple as getting the last element.
   > > > > > > Also, this logic could detect misconfigurations that can only be detected at runtime (second check in `getLastTransactionEvent()`), then the caller of getLastTransactionEvent() could detect it by means of the exception and set all the events as the last in the transaction. Why do this? because if not all events in the transaction go to the same senders (misconfiguration), that would imply that some senders will not get all the events for the transaction but could try if group-transaction-events is set to complete the transaction in the batch, by looking for the last event in the transaction which could never come (delaying the batch sending) or may come but not guaranteeing that the rest of the events for the transaction were sent in the batch (because they may not have reached the sender).
   > > > > > > The first check of `getLastTransactionEvent()` was added in order to avoid the cost of the second check if no sender had group-transaction-events set to true. In that case, it is irrelevant to return the last transaction event or null because it will not be used although, it could make more sense to return the last transaction event.
   > > > > > 
   > > > > > 
   > > > > > It looks like the GEODE-7971 tried to use grouping with the existing way of "invokeTXCallbacks()", where client, wan, cache-listener notification are done one event from transaction at a time. Instead it could have invoked wan separately from "firePendingCallbacks"; which has all the TX events in one group.
   > > > > > E.g.:
   > > > > > firePendingCallbacks() {
   > > > > > invokeTxCallbackForClientCacheLisener(getPendingCallbacks());
   > > > > > invokeTxCallbackForWan(getPendingCallbacks());
   > > > > > }
   > > > > > Then there was no need to tag last transaction event.
   > > > > > Please correct me if i am missing anything.
   > > > > > And calling "configError" for an TX entry not part of specific sender seems to be wrong; why would it be a error; think about the case where TX uses helper/lookup region entry/value to calculate a final result and other site is only interested in the final result, this is not an error, this is the most expected behavior.
   > > > > > E.g.:
   > > > > > getBonusIncreatementFromBonusRegion();
   > > > > > updateNewSalaryForEmployee(); // The region or final value other wan site is interested in.
   > > > > > I think for now we need to change the naming of the methods to be more meaningful (based on what they are doing):
   > > > > > "getLastTransactionEvent" to "getLastTransactionEventForGroupedWANSender()"
   > > > > > Instead of "ServiceConfigurationError" return null or boolean. As the new change is doing.
   > > > 
   > > > 
   > > > > Tagging the last transaction event is needed by `SerialGatewaySenderQueue` and `ParallelGatewaySenderQueue` to make sure that, when group-transaction-events is enabled, batches do not contain incomplete transactions. The way to assure that is to check that for every transaction in the batch (events are tagged with the transaction event), there is an event for that transaction that is the last event in the transaction (there is an event tagged as last transaction event).
   > > > > I do not see why your changes avoid the need for tagging the last transaction event.
   > > > 
   > > > 
   > > > Thinking more about this, I see what you are saying...
   > > > The other option could have been, separating WAN callback and creating a single event/message for all the entries in transaction. May be this is too much.
   > > > I thought about that alternative when I was designing this solution but abandoned it as it implied bigger changes.
   > > > > Secondly, having events for a transaction that will not be replicated by all senders (when some of the senders have group-transaction-events set to true) could POSSIBLY be a configuration error.
   > > > > Let me illustrate it with an example:
   > > > > Let's suppose that we have a system with two senders, sender1 and sender2, both with group-transaction-events set to true.
   > > > > Now let's suppose that a transaction (t1) arrives to Geode with two events: event1 (create a) and event2 (create b).
   > > > > Let's suppose that event1 is to be replicated by sender1 (but not sender2) and event 2 is to be replicated by sender2 (but not sender1).
   > > > > The last event for the transaction will be event2 and it will be tagged as such.
   > > > > What we will see happening is that even1 arrives to sender1's queue. When event1 is added to a batch and the batch is to be sent, it will be checked if all events for the transaction to which even1 belongs (t1) are in the batch. As there is only one event and it is not the last event in the the transaction, the batch creator will try to get the missing events for the transaction but it will never get them because they will never arrive, the last event in the transaction will not be on the sender1 queue but it will be on the sender2 queue.
   > > > > At the same time, event2 will arrive to sender2's queue. When this event is added to a batch and the batch is to be sent, it will be checked if all transactions are complete. For t1, as we have the last event for the transaction (event2), it will be considered that it is so. Nevertheless, as not all events for the transaction are sent to this sender, the assumption would be wrong.
   > > > 
   > > > 
   > > > I understand why the lastTransactionEvent; but I don't see how its a configuration error...When you say configuration error, are you meaning sender configuration or setting grouping configuration for a sender. In any case, it is different usecase but is not an error. Application have their own requirement of sending/managing data in different sites.
   > > 
   > > 
   > > I mean both. This is what the documentation says about this feature:
   > > _In order to use transaction event grouping:
   > > 
   > > * The `group-transaction-events` setting is supported only on serial senders with just one dispatcher thread, or on parallel senders.
   > > * The regions to which the transaction events belong must be replicated by the same set of gateway senders that also have this setting enabled.
   > > * This setting cannot be enabled if `enable-batch-conflation` is in effect._
   > > 
   > > The first and the third conditions can be (and are) checked at configuration time.
   > > But the second condition can only be checked at run time, when a transaction is received. If you have a transaction with several events that spans several regions, and the gateway senders for the regions are not the same, then group-transaction-events will not work even though whoever configured the gateway senders may think it will. Some senders will receive some events for the transaction and other senders will receive other events.
   > 
   > Yes...Thats up to application/use-case, how and what kind data it wants to be sent to other site. The WAN gateway senders
   > doesn't have any limitations like that...
   > 
   > > One may argue that this is not necessarily a configuration error but the fact is that, in this case, a transaction replicated by gateway senders who have group-transaction-events set to true will not be sent in the same batch because senders will not receive all the events for the transaction.
   > 
   > The grouping of transaction is some-way of making sure the transaction can be sent as a single group; but it should not be limiting factor (saying error). If it is a limiting factor, the TX should have been rolled back and an exception is thrown to application. Right now what I can see is nothing is rolled back or TX is aborted because of error, it is only logged as message; which is noticed after the event/error has already happened.
   You are right. It is not a limiting factor. But if you have configured group-transaction-events for some or all your senders and you have events in transactions being sent by those senders that cannot be grouped then chances are that you might have made a mistake during the configuration. As this cannot be detected at configuration time and it is not necessarily a configuration error, you do not want to rollback the transaction. But at least, you will have a log telling you that transactions are not being grouped so that you can check if there is a configuration error.
   
   
   > 
   > @albertogpz you also mentioned "This is what the documentation says about this feature:"; I believe this is not release feature and is not yet documented in Geode manuals; if this needs to be documented we need to create a jira ticket and tag docs team @davebarnes97
   
   This is documented in: geode-docs/topologies_and_comm/topology_concepts/multisite_overview.html.md.erb
   
   > 
   > Also, reading over the code, I see many of the methods in TXLastEventInTransactionUtils.class doesn't have unit level tests. We were wondering what happens with the helper methods in this class, when there are no senders configured, when there are multiple enders on each region with and without grouping option. It will really help Geode product having all the unit test cases covered; do you think you can add additional tests...
   
   Tests could also be added to the non-public methods of `TXLastEventInTransactionUtils` in `TXLastEventInTransactionUtilsTest` but the current tests on the public 'getLastTransactionEvent()` method seemed enough to me when I wrote them to cover the cases you mention.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@geode.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org