You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Eric Jensen (JIRA)" <ji...@apache.org> on 2010/10/06 00:18:41 UTC

[jira] Commented: (THRIFT-862) Async client issues / improvements

    [ https://issues.apache.org/jira/browse/THRIFT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918259#action_12918259 ] 

Eric Jensen commented on THRIFT-862:
------------------------------------

this patch would silently lose method calls that happen in the same millisecond.  you need to add an atomic sequence counter as discussed above.  also you should check the return value of timeoutWatchSet.add to make sure this can't happen

nextTimeout starts out initialized to NO_TIMEOUT_VALUE but i don't see a way it could ever get back to that state if the only thing left in the queue was one without a timeout.  i would recommend getting rid of that separate member variable entirely and using timeoutWatchSet.first() instead to avoid this kind of error.  if you were going to use a sentinel, it should probably Integer.MAX_VALUE instead of -1 since those should be at the end of the timeout queue not the beginning

you should do bounds checking on select(nextTimeout - System.currentTimeMillis()) so that you don't cause yourself an illegal argument

i agree that your list of Did not do / future to do captures everything not done

> Async client issues / improvements
> ----------------------------------
>
>                 Key: THRIFT-862
>                 URL: https://issues.apache.org/jira/browse/THRIFT-862
>             Project: Thrift
>          Issue Type: Task
>          Components: Java - Library
>    Affects Versions: 0.4
>            Reporter: Ning Liang
>            Assignee: Bryan Duxbury
>         Attachments: thrift-862.diff
>
>
> From Thrift-845:
> Hey Eric,
> You bring up some good points. We have indirect communication between the client manager and client going through the method call object, which we can refactor into direct communication. I'd like to keep the state machine separate from the async client manager, though - the client manager only coordinates transitioning the method call on ready. Responses inline for the other comments:
> 1. Timeouts per-transition aren't really what I want as a consumer of this api. It looks like there are about five states, which as a consumer I need to know since the effective timeout on the whole call is then 5x the one I set. I'd rather be guaranteed I will be called back one way or another within the timeout i set, counting time i spend in the selector's queue, etc. Seems like that would be easy to implement.
>     * The naming is misleading - we can have idle socket timeout or an overall method call timeout (more like a deadline). I coded for the former case, so we can modify to the latter or rename. Modifying is trivial - we just need to track the method call start time, as opposed to the last action time. The effective deadline is arbitrarily long, since we're updating the last action timestamp on each socket read/write, which doesn't necessarily correspond to a transition between method states.
> 2. You should not be allowed to set a timeout lower than TAsyncClientManager.SELECT_TIME, and the value there should be much lower, maybe 5ms? Because its per-transition as above, the true minimum timeout for an entire call is 1s if each of its states took 200ms. We've actually run clients with lower than 200ms timeouts.
>     * Agreed, though we won't be able to get to 5ms resolution. The reason is because the JVM makes no guarantees on actual sleep time. Maybe a better solution would be to use Selector.selectNow(), which returns immediately with all ready keys, making our timeout granularity as small as possible.
> 3. TAsyncMethodCall.onError does not do anything to remove itself from TAsyncClientManager.timeoutWatchSet, so unless it shows up in selectedKeys() it will stay there until its timeout. Not a big deal, but could make that set prohibitively large if there are many errors within a timeout interval.
>     * TAsyncClientManager removes the method after a transition if the method's client.hasError(). Again, somewhat confusing since it's relying on the method calling the client's onError.
> 4. We call TAsyncMethodCall.transition before adding it to timeoutWatchSet, so lastTransitionTime is initialized before it is inspected. But, this is a somewhat dangerous pattern to allow an uninitialized primitive to ever be accessible. You should probably initialize it to the current time or a sentinel on instantiation.
>     * Current time is good - if we end up using the deadline definition of timeout, we don't even have to update the value.
> 5. it might be safer for lastTransitionTime = System.currentTimeMillis() to be at the top of transition(SelectionKey key) so you can guarantee the timestamp is updated even for errant transitions...don't think it matter's functionally in the current code, but it could in refactorings. you don't want to inadvertently obscure an error with a timeout.
>     * Good point. I wanted the timestamp to reflect the last socket activity - putting at the beginning of transition could make it inaccurate on long socket reads/writes. Maybe we can use the finally block?
> 6. TNonblockingServerSocket is the only non-test, non-tutorial thrift source file that uses System.err.println instead of LOGGER.warn
>     * We can easily fix.
> 7. You might want to catch Exception instead of Throwable? I don't think you intend or want to catch things like OutOfMemoryError, etc. The docs say "An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch"
>     * Also easily fixed, though I want to make sure we didn't do this for a reason earlier. Bryan, why Throwable vs Exception?
> 8. TAsyncClientManager.call should really throw an exception if its SelectThread is not running (because its run method did not catch an unexpected Throwable) instead of inserting it into a queue that will grow unbounded and cause an OOM. You probably want a catch and throw block in run that cleans up by calling onError on all pending calls and sets its done flag. You probably also want to do this on ClosedSelectorException instead of just catching and logging it in an infinite loop. It's odd that you catch it on selectedKeys() but not select().
>     * That catch got folded into the catch(Throwable)
> 9. timeoutIdleMethods would be a bit faster if it only called currentTimeMillis() once instead of per-check
>     * Good point. I was going for accuracy but even for large timeout watch sets the iteration time should be negligible.
> 10. Probably an unnecessary optimization, but instead of relying on Object.hashCode for removing timed out methods, you could identify them by their start timestamp from currentTimeMillis() + a few bits of a static atomic sequence counter, put them in a TreeSet, and shortcut the iteration for timed out methods when you hit the first one that's not too old.
>     * Definitely, this was the original plan. We'll get it in on next iteration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.