You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Stephen Mallette <sp...@gmail.com> on 2021/03/18 16:00:28 UTC

[DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

I started a fresh thread for this topic Divij brought up, with more context:

> In a scenario where we have both
> session-less and sesion-ful requests being made to the server, the
resource
> allocation will not be fair and may lead to starvation for some requests.
> This problem exists even today, hence not totally correlated to what you
> are proposing but I am afraid a wider adoption of explicit
> transactions will bring this problem to the spotlight. Hence, we should
fix
> this problem first. A solution would entail converging the worker pool for
> both session vs session-less requests.

I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
question is how would we ensure that each request for a session ends up
being executed on the same thread from the previous request if jobs are
randomly submitted to a worker pool?


On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <di...@gmail.com>
wrote:

> Great idea Stephen. I agree with standardizing the explicit transaction
> semantics in cases of remote server (implicit aka sessionless is already
> self explanatory) and unifying the syntax with embedded graph usage.
>
> Couple of notes:
>
> 1. I would favor the begin() instead of create() as it's closer to
> universally prominent SQL transaction syntax. This would reduce the
> learning curve for users of TP.
>
> 2. From an implementation standpoint, sessionless requests are queued on
> the server side and are serviced by the worker thread pool. But a
> session-ful aka explicit transaction aka managed transaction starts a new
> single worked thread pool every time. In a scenario where we have both
> session-less and sesion-ful requests being made to the server, the resource
> allocation will not be fair and may lead to starvation for some requests.
> This problem exists even today, hence not totally correlated to what you
> are proposing but I am afraid a wider adoption of explicit
> transactions will bring this problem to the spotlight. Hence, we should fix
> this problem first. A solution would entail converging the worker pool for
> both session vs session-less requests.
>
> 3. You are proposing the idea of having a transaction scoped traversal
> object. I agree with it but we need more clarification in behavior wrt the
> following scenarios:
>
> Q. What happens when g.tx().commit() is called on a transaction scoped
> traversal object? Does the traversal get closed?
> Q. Currently, the same traversal object could be used to execute multiple
> session-less requests simultaneously in a thread safe manner. Does the same
> behaviour apply to transaction scoped traversal? If not, then how do I send
> multiple requests in parallel from the client all scoped to the same
> transaction on the server? Note that this is different from case of multi
> threaded transactions because on the server all requests are scoped to
> single transaction i.e. single thread but on the client they may be
> submitted via multiple threads.
>
> Regards,
> Divij Vaidya
>
>
>
> On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > The Transaction object is a bit important with remote cases because it
> > holds the reference to the session. With embedded use cases we generally
> > adhere to ThreadLocal transactions so the Transaction implementation in
> > that case is more of a controller for the current thread but for remote
> > cases the Transaction implementation holds a bit of state that can cross
> > over threads. In some ways, that makes remote cases feel like a "threaded
> > transaction" so that may be familiar to users?? Here's some example
> > syntax I currently have working in a test case:
> >
> > g = traversal().withRemote(conn)
> > gtx = g.tx().create()
> > assert gtx.isOpen() == true
> > gtx.addV('person').iterate()
> > gtx.addV('software').iterate()
> > gtx.commit() // alternatively you could explicitly rollback()
> > assert gtx.isOpen() == false
> >
> > I hope that documentation changes are enough to unify the syntax and
> remove
> > confusion despite there still being ThreadLocal transactions as a default
> > for embedded cases and something else for remote. At least they will look
> > the same, even if you technically don't need to do a g.tx().create() for
> > embedded transactions and can just have the control you always had with
> > g.tx().commit() directly.
> >
> > Note that the remote Transaction object can support configuration via
> > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the default
> of
> > "commit on close" to rollback or manual (or i suppose something custom).
> > It's nice that this piece works. I don't see a point in supporting
> > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in remote
> > contexts.
> >
> > Another point is that I've added what I termed a "Graph Operation"
> bytecode
> > instances which is bytecode that isn't related to traversal
> construction. I
> > hope this isn't one of those things where we end up wanting to deprecate
> an
> > idea as fast as we added it but we needed some way to pass
> commit/rollback
> > commands and they aren't really part of traversal construction. They are
> > operations that execute on the graph itself and I couldn't think of how
> to
> > treat them as traversals nicely, so they sorta just exist as a one off.
> > Perhaps they will grow in number?? Folks have always asked if bytecode
> > requests could get "GraphFeature" information - I suppose this could be a
> > way to do that??
> >
> > Anyway, I will keep going down this general path as it's appearing
> > relatively fruitful.
> >
> >
> >
> >
> > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > > We should just communicate clearly that a simple remote traversal
> > > already uses a transaction by default,
> > >
> > > That's a good point because even with this change we still have a
> > > situation where a single iterated remote traversal will generally mean
> a
> > > server managed commit/rollback, but in embedded mode, we have an open
> > > transaction (for transaction enabled graphs - TinkerGraph will behave
> > more
> > > like a remote traversal, so more confusion there i guess). I'm not sure
> > how
> > > to rectify that at this time except by way of documentation.
> > >
> > > The only thing I can think of is that the default for embedded would
> have
> > > to be auto-commit per traversal. In that way it would work like remote
> > > traversals and for graphs that don't support transactions like
> > TinkerGraph.
> > > Of course, that's the kind of change that will break a lot of code.
> Maybe
> > > we just keep that idea for another version.
> > >
> > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de> wrote:
> > >
> > >> I like the idea of adding transactions for remote traversals as they
> > >> close a gap in functionality that we currently have for remote
> > traversals.
> > >> We should just communicate clearly that a simple remote traversal
> > already
> > >> uses a transaction by default, meaning that the server will roll back
> > any
> > >> changes if any exception occurs. Users often ask about how to do
> > >> transactions with a remote traversal because they don't know about
> this
> > and
> > >> I'm afraid that we might add even more confusion if we now add
> > transactions
> > >> for remote traversals. Hence why I think that we should communicate
> this
> > >> clearly when we add remote transactions.
> > >>
> > >> Getting this to work in the GLVs should be possible but will require
> > some
> > >> effort. I think we would have to introduce some
> > >> "DriverRemoteTransactionTraversal" that doesn't submit the traversal
> on
> > a
> > >> terminal step but saves it and then submits all saved traversals
> > together
> > >> on close().
> > >> This also means that we should directly add an async version of
> close(),
> > >> maybe closeAsync()? (Same for commit() and rollback())
> > >>
> > >> I also like create() better than traversal() because it would confuse
> me
> > >> to first start a traversal with traversal() and then also start a
> > >> transaction with the same method.
> > >>
> > >> -----Ursprüngliche Nachricht-----
> > >> Von: Stephen Mallette <sp...@gmail.com>
> > >> Gesendet: Donnerstag, 18. März 2021 00:01
> > >> An: dev@tinkerpop.apache.org
> > >> Betreff: Re: [DSISCUSS] Remote Transactions
> > >>
> > >> One thing I should have noted about tx().create(). The create() very
> > well
> > >> could have been named traversal(), thus the previous example reading
> as:
> > >>
> > >> g = traversal().withEmbedded(graph)
> > >> // or
> > >> g = traversal().withRemote(conn)
> > >> gtx = g.tx().traversal()
> > >> gtx.addV('person').iterate()
> > >> gtx.addV('software').iterate()
> > >> gtx.close() // alternatively you could explicitly commit() or
> rollback()
> > >>
> > >> You're basically spawning a new GraphTraversalSource from the
> > Transaction
> > >> object rather than from a Graph or AnonymousTraversal. I chose
> create()
> > >> because it felt like this looked weird:
> > >>
> > >> g = traversal().withRemote(conn).tx().traversal()
> > >>
> > >> which would be a weird thing to do I guess, but just seeing
> > "traversal()"
> > >> over and over seemed odd looking and I wanted to differentiate with
> the
> > >> Transaction object even though the methods are identical in what they
> > do. I
> > >> suppose I also drew inspiration from:
> > >>
> > >> Transaction.createdThreadedTx()
> > >>
> > >> which I think we might consider deprecating. JanusGraph would simply
> > >> expose their own Transaction object in the future that has that method
> > as I
> > >> imagine folks still use that feature. As far as I know, no other graph
> > >> implements that functionality and my guess is that no one will likely
> > do so
> > >> in the future.
> > >>
> > >> anyway, if you like traversal() better than create() or the other way
> > >> around, please let me know
> > >>
> > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <dave@bechberger.com
> >
> > >> wrote:
> > >>
> > >> > I am in favor of this change as I any idea that unifies the multiple
> > >> > different ways things work in TP will only make it easier to learn
> and
> > >> > adopt.
> > >> >
> > >> > I don't know that I have enough knowledge of the inner workings of
> > >> > transactions to know what/if this will cause problems.
> > >> >
> > >> > Dave
> > >> >
> > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
> > >> > <sp...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > We haven't touched "transactions" since TP3 was originally
> designed.
> > >> > > They have remained a feature for embedded use cases even in the
> face
> > >> > > of the
> > >> > rise
> > >> > > of remote graph use cases and result in major inconsistencies that
> > >> > > really bother users.
> > >> > >
> > >> > > As we close on 3.5.0, I figured that perhaps there was a chance to
> > >> > > do something radical about transactions. Basically, I'd like
> > >> > > transactions to work irrespective of remote or embedded usage and
> > >> > > for them to have the
> > >> > same
> > >> > > API when doing so. In mulling it over for the last day or so I
> had a
> > >> > > realization that makes me believe that the following is possible:
> > >> > >
> > >> > > g = traversal().withEmbedded(graph)
> > >> > > // or
> > >> > > g = traversal().withRemote(conn)
> > >> > > gtx = g.tx().create()
> > >> > > gtx.addV('person').iterate()
> > >> > > gtx.addV('software').iterate()
> > >> > > gtx.close() // alternatively you could explicitly commit() or
> > >> > > rollback()
> > >> > >
> > >> > > // you could still use g for sessionless, but gtx is "done" after
> //
> > >> > > you close so you will need to create() a new gtx instance for a //
> > >> > > fresh transaction assert 2 == g.V().count().next()
> > >> > >
> > >> > > Note that the create() method on tx() is the new bit of necessary
> > >> syntax.
> > >> > > Technically, it is a do-nothing for embedded mode (and you could
> > >> > > skip it for thread-local auto-transactions) but all the
> > >> > > documentation can be shifted so that the API is identical to
> remote.
> > >> > > The change would be non-breaking as the embedded transaction
> > >> > > approach would remain as it is, but would no longer be documented
> as
> > >> > > the preferred approach. Perhaps we could one day disallow it but
> > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we want to
> > >> touch in TP3.
> > >> > >
> > >> > > What happens behind the scenes of g.tx().create() is that in
> remote
> > >> > > cases the RemoteConnection constructs a remote Transaction
> > >> > > implementation which basically is used to spawn new traversal
> source
> > >> > > instances with a session based connections. The remote Transaction
> > >> > > object can't support
> > >> > transaction
> > >> > > listeners or special read-write/close configurations, but I don't
> > >> > > think that's a problem as those things don't make a lot of sense
> in
> > >> > > remote use cases.
> > >> > >
> > >> > > From the server perspective, this change would also mean that
> > >> > > sessions would have to accept bytecode and not just scripts, which
> > >> > > technically shouldn't be a problem.
> > >> > >
> > >> > > One downside at the moment is that i'm thinking mostly in Java at
> > >> > > the moment. I'm not sure how this all works in other variants just
> > >> > > yet, but
> > >> > I'd
> > >> > > hope to keep similar syntax.
> > >> > >
> > >> > > I will keep experimenting tomorrow. If you have any thoughts on
> the
> > >> > matter,
> > >> > > I'd be happy to hear them. Thanks!
> > >> > >
> > >> >
> > >>
> > >>
> >
>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

I just thought I'd close this discussion up a bit as the PR is now merged.
I think it's worth noting that as long as the TinkerPop Transaction API
does not have a universally accepted method for passing around transaction
state among threads we will always have a situation where we need to keep a
server thread for a session (createThreadedTransaction() isn't implemented
outside of JanusGraph and I"m not sure we even want to expand on that
particular approach given where we are today).

What's happened in this pull request is that we've mostly just better
served remote g.tx() style transactions that are short/medium lived and
have a similar workload profile as their sessionless counterparts. By
sharing the same gremlinPool the performance of remote g.tx() transactions
in that context improves dramatically. Ultimately, we would want the final
use case, long lived transactions, to operate under this model well. It can
do so now but at the cost of a thread from the gremlinPool which may not be
desirable. Anyway, I think this work takes us in a good direction toward
that goal and should help some folks immediately even without a perfect
implementation of the long-lived use case. Perhaps we could think further
about transaction changes in 3.6.0 to get this completely right at that
point.

On Mon, Apr 5, 2021 at 1:15 PM Stephen Mallette <sp...@gmail.com>
wrote:

> I've opened a PR for the UnifiedChannelizer:
>
> https://github.com/apache/tinkerpop/pull/1414
>
> Over the weekend I passed 10s of millions of messages through the
> UnifiedChannelizer and the server stayed quite stable throughout it all.
> With all the tests passing nicely now in travis, docker and locally I think
> this one is now ready for review.
>
> On Fri, Apr 2, 2021 at 8:35 AM Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> I've solved the Travis issue. Some tests simply required more threads in
>> gremlinPool than the default which is set to the available cores. On Travis
>> you probably get a machine with 2 cores available and some tests required 3
>> or more to keep that many sessions open at once. Silly problem though the
>> analysis did help uncover other issues so I guess travis did its job.
>>
>> I've merged the g.tx() work and rebased the UnifiedChannelizer branch and
>> will continue with profiling/testing into early next week. I think that
>> should set this up for a PR at that point. If you'd like to see what the
>> work looks like, I think it would be fine to do so now at:
>>
>> https://github.com/apache/tinkerpop/compare/TINKERPOP-2245
>>
>>
>>
>>
>> On Thu, Apr 1, 2021 at 11:57 AM Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>>> I've been fussing with Travis for a while now haven't had much luck
>>> figuring out what is wrong. Decided to start profiling the server to see if
>>> that provided any hints (and it was something I intended to do in any
>>> case). Two issues came up:
>>>
>>> 1. shouldEventuallySucceedOnSameServerWithDefault() had some bad
>>> assertion logic and it fails for both UnifiedChannelizer and for the old
>>> OpProcessor stuff when the assertions are corrected. I've @Ignore'd that
>>> test for now. No idea if it ever worked or if it's busted with some more
>>> recent changes to the driver. I don't think that this was the cause of the
>>> problems with travis but I can't be sure.
>>> 2. AliasClusteredClient did not proxy calls to its underlying Client
>>> instance. This was purposeful, but has bad implications for
>>> DriverRemoteConnection. I'd now consider it a bug in 3.5.0 and I've changed
>>> the behavior. Ultimately, the whole Client aliasing probably needs to be
>>> changed, but it's too big a thing to try to get into now. Not completely
>>> sure yet if this fixes travis either, but I sense it has something to do
>>> with it.
>>>
>>> Travis is a bit backed up today so it's slow getting results. Hopefully
>>> (2) solved the problem or at least leads to an error i can see.
>>>
>>> The nice side-effect of this profiling is that I've come to see how much
>>> faster UnifiedChannelizer is as compared to the OpProcessor. Using the new
>>> remote transaction approach to load 100k vertices and 50k edges in 50k
>>> separate transactions (2 vertices/1 edge per tx), took under 3 minutes with
>>> UnifiedChannelizer and the server stayed remarkably stable in terms of
>>> resources/gc. The same workload on OpProcessor.........I gave up waiting
>>> for it to finish after 1500 transactions, which took 9 minutes. Looking at
>>> the jfr, there was a really weird gc pattern and memory usage under
>>> OpProcessor which perhaps had some connection to our quickly
>>> starting/killing threads and GremlinExecutor instances.
>>>
>>> I plan to keep testing/profiling for a bit longer before issuing the PR
>>> (and hopefully getting travis fixed in the process)...more updates to come
>>> i imagine.
>>>
>>> Divij, I'm glad that you brought this issue up early. I imagine I would
>>> have noticed the problem in profiling, but it was smart to call it out
>>> early. Nice job.
>>>
>>>
>>>
>>> On Mon, Mar 29, 2021 at 6:55 AM Stephen Mallette <sp...@gmail.com>
>>> wrote:
>>>
>>>> I can't seem to get Travis to pass the build for this branch with the
>>>> UnifiedChannelizer enabled.
>>>>
>>>> https://travis-ci.com/github/apache/tinkerpop/builds/221427334
>>>>
>>>> It just hangs without much information. I can't seem to get it to
>>>> behave otherwise and it doesn't fail for me locally (of course). If anyone
>>>> has some free CPU cycles could you please do:
>>>>
>>>> mvn clean install -DskipTests && mvn verify -pl :gremlin-server
>>>> -DskipTests -DskipIntegrationTests=false -DincludeNeo4j -DtestUnified=true
>>>>
>>>> i'm curious if anyone else can recreate the issue.....Thanks
>>>>
>>>>
>>>>
>>>> On Thu, Mar 25, 2021 at 3:33 PM Stephen Mallette <sp...@gmail.com>
>>>> wrote:
>>>>
>>>>> It took me a long time to get to the final steps of this change.
>>>>> Uncovered some unexpected bugs in the WsAndHttpChannelizer and that was
>>>>> gnarly to sort out. Also found some problems with authorization through the
>>>>> SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these
>>>>> combined channelizers really work. Seems like the new tangle of stuff in
>>>>> Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at
>>>>> this point. I might make some attempts on this body of work since it's
>>>>> fresh in my mind but we'll see.
>>>>>
>>>>> After getting through these little nits I've managed to get all of the
>>>>> Gremlin Server integration test cases passing with the new
>>>>> UnifiedChannelizer. I've added a -DtestUnified property to run the
>>>>> UnifiedChannelizer rather than the old stuff. This way we can add a new job
>>>>> to travis to cover that by itself in parallel to the old. You can run it
>>>>> like this:
>>>>>
>>>>> mvn verify -pl gremlin-server -DskipIntegrationTests=false
>>>>> -DtestUnified=true
>>>>>
>>>>> Having all the tests pass on the new channelizer makes me feel pretty
>>>>> confident that what I have at least works (and without much change in
>>>>> behavior) and therefore have pushed a branch for early review:
>>>>>
>>>>> https://github.com/apache/tinkerpop/tree/TINKERPOP-2245
>>>>>
>>>>> I still need to write a few more tests and after that it would be time
>>>>> to do a bit of performance testing to see how it compares to the old
>>>>> method. I'm also missing "metrics". If all that is good then I'll need to
>>>>> do a gang of documentation work. I imagine I will continue to keep this
>>>>> work separate from the bytecode transaction work that spawned it and merge
>>>>> the transaction stuff first followed by this. In that way, expect this
>>>>> branch to be a bit volatile with rebasing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <sp...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> More reasonable progress with the UnifiedChannelizer. The short
>>>>>> version is that I have a pretty major quantity of the server integration
>>>>>> tests passing under this new Channelizer. The ones that are failing are
>>>>>> typically ones that assert some specific log messages or exception message
>>>>>> which may not necessarily apply anymore. I'm still mulling those over.
>>>>>>
>>>>>> The basic structure of the UnifiedChannelizer is that it sorta copies
>>>>>> the WsAndHttpChannelizer and builds from there, so when this
>>>>>> UnfiedChannelizer is configured you'll get HTTP and Websockets. The
>>>>>> UnifiedChannelizer gets rid of the entire OpProcessor infrastructure. I
>>>>>> think that was a good body of code years ago, but we've now learned quite
>>>>>> explicitly what we want Gremlin Server to do and what role it plays.
>>>>>> OpProcessor just feels like an unnecessary abstraction now. Removing it
>>>>>> simplifies the code of the UnifiedChannelizer, which adds a single
>>>>>> UnifiedHandler to the netty pipeline. That UnifiedHandler streamlines all
>>>>>> those OpProcessor implementations into one so all the duplicate code is
>>>>>> effectively removed by running RequestMessage instances through a single
>>>>>> common flow. That means all the request validation, error handling,
>>>>>> iterator processing, result building, transaction management, etc. will be
>>>>>> in one place.
>>>>>>
>>>>>> The UnifiedHandler also kills out the GremlinExecutor. That means all
>>>>>> the Lifecycle callbacks and random lambdas that generated so much
>>>>>> indirection in the name of extensibility are no longer confusing the code
>>>>>> base and the wicked hierarchy of timesouts and exception throws are all
>>>>>> flattened.
>>>>>>
>>>>>> When the UnifiedHandler gets a RequestMessage it validates it, builds
>>>>>> a Context and creates a Rexster instance (thought i'd bring back the name
>>>>>> rather than go with the generic "Worker" interface) that has the Context
>>>>>> added to it as a task. The Rexster instance is then submitted to the
>>>>>> Gremlin thread pool to do work when a thread is available. Rexster will
>>>>>> hang on to that thread awaiting tasks that are assigned to it. For
>>>>>> sessionless requests, that means Rexster handles one request and exits. For
>>>>>> a sessionful request it means the Rexster will wait for more tasks (Context
>>>>>> objects from new requests) to be added to it. The UnifiedHandler holds
>>>>>> those Rexster instances in a Map aligning session identifiers to Rexster
>>>>>> instances. As Divij alluded to, I suppose this means we are putting down
>>>>>> foundation for query cancellation and better insight into the queue of
>>>>>> things that are running.
>>>>>>
>>>>>> Now that the gremlinPool server setting covers both types of requests
>>>>>> we no longer have a situation where the number of threads that can be
>>>>>> running is unbounded and a "session" will now look a bit like a "long run
>>>>>> job" in that it will consume a thread from the pool for as long as the
>>>>>> session is open. So if the gremlinPool is 16, you could have 16 sessions
>>>>>> active. Sessionless requests would begin to queue as would requests for new
>>>>>> sessions and they would be serviced in the order received.
>>>>>>
>>>>>> I'm still not quite ready for a pull request. Despite having the bulk
>>>>>> of the integration tests working, the code isn't quite the way I think it
>>>>>> will be finally organized (I used a lot of inner classes and things which
>>>>>> probably should change). I also need to figure out how to run the tests. I
>>>>>> think I need to run them all per channelizer (old and new) but I think that
>>>>>> will push the build time over maximum on Travis, so I may need to break
>>>>>> things up somehow. I'm reasonably sure I'll have a branch ready to push
>>>>>> before the end of the week.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <
>>>>>> spmallette@gmail.com> wrote:
>>>>>>
>>>>>>> I've started to play with some ideas for pushing all requests
>>>>>>> scripts/bytecode sessions/sessionless down one unified pipeline with one
>>>>>>> shared thread pool. The most interesting thing I've seen so far is how much
>>>>>>> the code could potentially simplify under this approach. The
>>>>>>> GremlinExecutor was the wrong abstraction in a sense and it forced way too
>>>>>>> many callbacks as arguments and made error handling kinda sloppy which made
>>>>>>> the server code harder to reuse, extend and read. Of course, all that was
>>>>>>> built before the first remote graph was built and we were just wildly
>>>>>>> guessing at how DS Graph was going to work with all this. Now, I think the
>>>>>>> remote model is much more clear, so maybe all this can be made "right" now.
>>>>>>>
>>>>>>> I think that if I can see any success at all with this in the next
>>>>>>> few days, we could include a new UnfiiedChannelizer that would replace the
>>>>>>> existing ones. I think we'd keep the old one in place by default and keep
>>>>>>> my copy/paste work for SessionOpProcessor that added bytecode there, with
>>>>>>> the idea that the UnifiedChannelizer will become the future default as we
>>>>>>> get it tested further along the 3.5.x line.
>>>>>>>
>>>>>>> I imagine I'll have more details on this task tomorrow and will post
>>>>>>> back here when I do.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <
>>>>>>> divijvaidya13@gmail.com> wrote:
>>>>>>>
>>>>>>>> >
>>>>>>>> > I suppose the
>>>>>>>> > question is how would we ensure that each request for a session
>>>>>>>> ends up
>>>>>>>> > being executed on the same thread from the previous request if
>>>>>>>> jobs are
>>>>>>>> > randomly submitted to a worker pool?
>>>>>>>>
>>>>>>>>
>>>>>>>> I haven't thought through the details, but on top of my head, we
>>>>>>>> would have
>>>>>>>> to maintain some request<->thread mapping on the server. This
>>>>>>>> mapping is
>>>>>>>> also a building block for a request cancellation feature in future
>>>>>>>> where a
>>>>>>>> client would be able to send a cancellation request to the server,
>>>>>>>> the
>>>>>>>> server will map the request to a thread executing that request and
>>>>>>>> then set
>>>>>>>> an interrupt on that thread to signify cancellation.
>>>>>>>>
>>>>>>>> Divij Vaidya
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <
>>>>>>>> spmallette@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > I started a fresh thread for this topic Divij brought up, with
>>>>>>>> more
>>>>>>>> > context:
>>>>>>>> >
>>>>>>>> > > In a scenario where we have both
>>>>>>>> > > session-less and sesion-ful requests being made to the server,
>>>>>>>> the
>>>>>>>> > resource
>>>>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>>>>> requests.
>>>>>>>> > > This problem exists even today, hence not totally correlated to
>>>>>>>> what you
>>>>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>>>>> > > transactions will bring this problem to the spotlight. Hence,
>>>>>>>> we should
>>>>>>>> > fix
>>>>>>>> > > this problem first. A solution would entail converging the
>>>>>>>> worker pool
>>>>>>>> > for
>>>>>>>> > > both session vs session-less requests.
>>>>>>>> >
>>>>>>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose
>>>>>>>> the
>>>>>>>> > question is how would we ensure that each request for a session
>>>>>>>> ends up
>>>>>>>> > being executed on the same thread from the previous request if
>>>>>>>> jobs are
>>>>>>>> > randomly submitted to a worker pool?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <
>>>>>>>> divijvaidya13@gmail.com>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > > Great idea Stephen. I agree with standardizing the explicit
>>>>>>>> transaction
>>>>>>>> > > semantics in cases of remote server (implicit aka sessionless
>>>>>>>> is already
>>>>>>>> > > self explanatory) and unifying the syntax with embedded graph
>>>>>>>> usage.
>>>>>>>> > >
>>>>>>>> > > Couple of notes:
>>>>>>>> > >
>>>>>>>> > > 1. I would favor the begin() instead of create() as it's closer
>>>>>>>> to
>>>>>>>> > > universally prominent SQL transaction syntax. This would reduce
>>>>>>>> the
>>>>>>>> > > learning curve for users of TP.
>>>>>>>> > >
>>>>>>>> > > 2. From an implementation standpoint, sessionless requests are
>>>>>>>> queued on
>>>>>>>> > > the server side and are serviced by the worker thread pool. But
>>>>>>>> a
>>>>>>>> > > session-ful aka explicit transaction aka managed transaction
>>>>>>>> starts a new
>>>>>>>> > > single worked thread pool every time. In a scenario where we
>>>>>>>> have both
>>>>>>>> > > session-less and sesion-ful requests being made to the server,
>>>>>>>> the
>>>>>>>> > resource
>>>>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>>>>> requests.
>>>>>>>> > > This problem exists even today, hence not totally correlated to
>>>>>>>> what you
>>>>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>>>>> > > transactions will bring this problem to the spotlight. Hence,
>>>>>>>> we should
>>>>>>>> > fix
>>>>>>>> > > this problem first. A solution would entail converging the
>>>>>>>> worker pool
>>>>>>>> > for
>>>>>>>> > > both session vs session-less requests.
>>>>>>>> > >
>>>>>>>> > > 3. You are proposing the idea of having a transaction scoped
>>>>>>>> traversal
>>>>>>>> > > object. I agree with it but we need more clarification in
>>>>>>>> behavior wrt
>>>>>>>> > the
>>>>>>>> > > following scenarios:
>>>>>>>> > >
>>>>>>>> > > Q. What happens when g.tx().commit() is called on a transaction
>>>>>>>> scoped
>>>>>>>> > > traversal object? Does the traversal get closed?
>>>>>>>> > > Q. Currently, the same traversal object could be used to
>>>>>>>> execute multiple
>>>>>>>> > > session-less requests simultaneously in a thread safe manner.
>>>>>>>> Does the
>>>>>>>> > same
>>>>>>>> > > behaviour apply to transaction scoped traversal? If not, then
>>>>>>>> how do I
>>>>>>>> > send
>>>>>>>> > > multiple requests in parallel from the client all scoped to the
>>>>>>>> same
>>>>>>>> > > transaction on the server? Note that this is different from
>>>>>>>> case of multi
>>>>>>>> > > threaded transactions because on the server all requests are
>>>>>>>> scoped to
>>>>>>>> > > single transaction i.e. single thread but on the client they
>>>>>>>> may be
>>>>>>>> > > submitted via multiple threads.
>>>>>>>> > >
>>>>>>>> > > Regards,
>>>>>>>> > > Divij Vaidya
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>>>>>>>> spmallette@gmail.com>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > > > The Transaction object is a bit important with remote cases
>>>>>>>> because it
>>>>>>>> > > > holds the reference to the session. With embedded use cases we
>>>>>>>> > generally
>>>>>>>> > > > adhere to ThreadLocal transactions so the Transaction
>>>>>>>> implementation in
>>>>>>>> > > > that case is more of a controller for the current thread but
>>>>>>>> for remote
>>>>>>>> > > > cases the Transaction implementation holds a bit of state
>>>>>>>> that can
>>>>>>>> > cross
>>>>>>>> > > > over threads. In some ways, that makes remote cases feel like
>>>>>>>> a
>>>>>>>> > "threaded
>>>>>>>> > > > transaction" so that may be familiar to users?? Here's some
>>>>>>>> example
>>>>>>>> > > > syntax I currently have working in a test case:
>>>>>>>> > > >
>>>>>>>> > > > g = traversal().withRemote(conn)
>>>>>>>> > > > gtx = g.tx().create()
>>>>>>>> > > > assert gtx.isOpen() == true
>>>>>>>> > > > gtx.addV('person').iterate()
>>>>>>>> > > > gtx.addV('software').iterate()
>>>>>>>> > > > gtx.commit() // alternatively you could explicitly rollback()
>>>>>>>> > > > assert gtx.isOpen() == false
>>>>>>>> > > >
>>>>>>>> > > > I hope that documentation changes are enough to unify the
>>>>>>>> syntax and
>>>>>>>> > > remove
>>>>>>>> > > > confusion despite there still being ThreadLocal transactions
>>>>>>>> as a
>>>>>>>> > default
>>>>>>>> > > > for embedded cases and something else for remote. At least
>>>>>>>> they will
>>>>>>>> > look
>>>>>>>> > > > the same, even if you technically don't need to do a
>>>>>>>> g.tx().create()
>>>>>>>> > for
>>>>>>>> > > > embedded transactions and can just have the control you
>>>>>>>> always had with
>>>>>>>> > > > g.tx().commit() directly.
>>>>>>>> > > >
>>>>>>>> > > > Note that the remote Transaction object can support
>>>>>>>> configuration via
>>>>>>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from
>>>>>>>> the default
>>>>>>>> > > of
>>>>>>>> > > > "commit on close" to rollback or manual (or i suppose
>>>>>>>> something
>>>>>>>> > custom).
>>>>>>>> > > > It's nice that this piece works. I don't see a point in
>>>>>>>> supporting
>>>>>>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make
>>>>>>>> sense in
>>>>>>>> > remote
>>>>>>>> > > > contexts.
>>>>>>>> > > >
>>>>>>>> > > > Another point is that I've added what I termed a "Graph
>>>>>>>> Operation"
>>>>>>>> > > bytecode
>>>>>>>> > > > instances which is bytecode that isn't related to traversal
>>>>>>>> > > construction. I
>>>>>>>> > > > hope this isn't one of those things where we end up wanting to
>>>>>>>> > deprecate
>>>>>>>> > > an
>>>>>>>> > > > idea as fast as we added it but we needed some way to pass
>>>>>>>> > > commit/rollback
>>>>>>>> > > > commands and they aren't really part of traversal
>>>>>>>> construction. They
>>>>>>>> > are
>>>>>>>> > > > operations that execute on the graph itself and I couldn't
>>>>>>>> think of how
>>>>>>>> > > to
>>>>>>>> > > > treat them as traversals nicely, so they sorta just exist as
>>>>>>>> a one off.
>>>>>>>> > > > Perhaps they will grow in number?? Folks have always asked if
>>>>>>>> bytecode
>>>>>>>> > > > requests could get "GraphFeature" information - I suppose
>>>>>>>> this could
>>>>>>>> > be a
>>>>>>>> > > > way to do that??
>>>>>>>> > > >
>>>>>>>> > > > Anyway, I will keep going down this general path as it's
>>>>>>>> appearing
>>>>>>>> > > > relatively fruitful.
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>>>>>>>> spmallette@gmail.com
>>>>>>>> > >
>>>>>>>> > > > wrote:
>>>>>>>> > > >
>>>>>>>> > > > > > We should just communicate clearly that a simple remote
>>>>>>>> traversal
>>>>>>>> > > > > already uses a transaction by default,
>>>>>>>> > > > >
>>>>>>>> > > > > That's a good point because even with this change we still
>>>>>>>> have a
>>>>>>>> > > > > situation where a single iterated remote traversal will
>>>>>>>> generally
>>>>>>>> > mean
>>>>>>>> > > a
>>>>>>>> > > > > server managed commit/rollback, but in embedded mode, we
>>>>>>>> have an open
>>>>>>>> > > > > transaction (for transaction enabled graphs - TinkerGraph
>>>>>>>> will behave
>>>>>>>> > > > more
>>>>>>>> > > > > like a remote traversal, so more confusion there i guess).
>>>>>>>> I'm not
>>>>>>>> > sure
>>>>>>>> > > > how
>>>>>>>> > > > > to rectify that at this time except by way of documentation.
>>>>>>>> > > > >
>>>>>>>> > > > > The only thing I can think of is that the default for
>>>>>>>> embedded would
>>>>>>>> > > have
>>>>>>>> > > > > to be auto-commit per traversal. In that way it would work
>>>>>>>> like
>>>>>>>> > remote
>>>>>>>> > > > > traversals and for graphs that don't support transactions
>>>>>>>> like
>>>>>>>> > > > TinkerGraph.
>>>>>>>> > > > > Of course, that's the kind of change that will break a lot
>>>>>>>> of code.
>>>>>>>> > > Maybe
>>>>>>>> > > > > we just keep that idea for another version.
>>>>>>>> > > > >
>>>>>>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de>
>>>>>>>> wrote:
>>>>>>>> > > > >
>>>>>>>> > > > >> I like the idea of adding transactions for remote
>>>>>>>> traversals as they
>>>>>>>> > > > >> close a gap in functionality that we currently have for
>>>>>>>> remote
>>>>>>>> > > > traversals.
>>>>>>>> > > > >> We should just communicate clearly that a simple remote
>>>>>>>> traversal
>>>>>>>> > > > already
>>>>>>>> > > > >> uses a transaction by default, meaning that the server
>>>>>>>> will roll
>>>>>>>> > back
>>>>>>>> > > > any
>>>>>>>> > > > >> changes if any exception occurs. Users often ask about how
>>>>>>>> to do
>>>>>>>> > > > >> transactions with a remote traversal because they don't
>>>>>>>> know about
>>>>>>>> > > this
>>>>>>>> > > > and
>>>>>>>> > > > >> I'm afraid that we might add even more confusion if we now
>>>>>>>> add
>>>>>>>> > > > transactions
>>>>>>>> > > > >> for remote traversals. Hence why I think that we should
>>>>>>>> communicate
>>>>>>>> > > this
>>>>>>>> > > > >> clearly when we add remote transactions.
>>>>>>>> > > > >>
>>>>>>>> > > > >> Getting this to work in the GLVs should be possible but
>>>>>>>> will require
>>>>>>>> > > > some
>>>>>>>> > > > >> effort. I think we would have to introduce some
>>>>>>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>>>>>>>> traversal
>>>>>>>> > > on
>>>>>>>> > > > a
>>>>>>>> > > > >> terminal step but saves it and then submits all saved
>>>>>>>> traversals
>>>>>>>> > > > together
>>>>>>>> > > > >> on close().
>>>>>>>> > > > >> This also means that we should directly add an async
>>>>>>>> version of
>>>>>>>> > > close(),
>>>>>>>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>>>>>>>> > > > >>
>>>>>>>> > > > >> I also like create() better than traversal() because it
>>>>>>>> would
>>>>>>>> > confuse
>>>>>>>> > > me
>>>>>>>> > > > >> to first start a traversal with traversal() and then also
>>>>>>>> start a
>>>>>>>> > > > >> transaction with the same method.
>>>>>>>> > > > >>
>>>>>>>> > > > >> -----Ursprüngliche Nachricht-----
>>>>>>>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>>>>>>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>>>>>>>> > > > >> An: dev@tinkerpop.apache.org
>>>>>>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>>>>>>>> > > > >>
>>>>>>>> > > > >> One thing I should have noted about tx().create(). The
>>>>>>>> create() very
>>>>>>>> > > > well
>>>>>>>> > > > >> could have been named traversal(), thus the previous
>>>>>>>> example reading
>>>>>>>> > > as:
>>>>>>>> > > > >>
>>>>>>>> > > > >> g = traversal().withEmbedded(graph)
>>>>>>>> > > > >> // or
>>>>>>>> > > > >> g = traversal().withRemote(conn)
>>>>>>>> > > > >> gtx = g.tx().traversal()
>>>>>>>> > > > >> gtx.addV('person').iterate()
>>>>>>>> > > > >> gtx.addV('software').iterate()
>>>>>>>> > > > >> gtx.close() // alternatively you could explicitly commit()
>>>>>>>> or
>>>>>>>> > > rollback()
>>>>>>>> > > > >>
>>>>>>>> > > > >> You're basically spawning a new GraphTraversalSource from
>>>>>>>> the
>>>>>>>> > > > Transaction
>>>>>>>> > > > >> object rather than from a Graph or AnonymousTraversal. I
>>>>>>>> chose
>>>>>>>> > > create()
>>>>>>>> > > > >> because it felt like this looked weird:
>>>>>>>> > > > >>
>>>>>>>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>>>>>>>> > > > >>
>>>>>>>> > > > >> which would be a weird thing to do I guess, but just seeing
>>>>>>>> > > > "traversal()"
>>>>>>>> > > > >> over and over seemed odd looking and I wanted to
>>>>>>>> differentiate with
>>>>>>>> > > the
>>>>>>>> > > > >> Transaction object even though the methods are identical
>>>>>>>> in what
>>>>>>>> > they
>>>>>>>> > > > do. I
>>>>>>>> > > > >> suppose I also drew inspiration from:
>>>>>>>> > > > >>
>>>>>>>> > > > >> Transaction.createdThreadedTx()
>>>>>>>> > > > >>
>>>>>>>> > > > >> which I think we might consider deprecating. JanusGraph
>>>>>>>> would simply
>>>>>>>> > > > >> expose their own Transaction object in the future that has
>>>>>>>> that
>>>>>>>> > method
>>>>>>>> > > > as I
>>>>>>>> > > > >> imagine folks still use that feature. As far as I know, no
>>>>>>>> other
>>>>>>>> > graph
>>>>>>>> > > > >> implements that functionality and my guess is that no one
>>>>>>>> will
>>>>>>>> > likely
>>>>>>>> > > > do so
>>>>>>>> > > > >> in the future.
>>>>>>>> > > > >>
>>>>>>>> > > > >> anyway, if you like traversal() better than create() or
>>>>>>>> the other
>>>>>>>> > way
>>>>>>>> > > > >> around, please let me know
>>>>>>>> > > > >>
>>>>>>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>>>>>>>> > dave@bechberger.com
>>>>>>>> > > >
>>>>>>>> > > > >> wrote:
>>>>>>>> > > > >>
>>>>>>>> > > > >> > I am in favor of this change as I any idea that unifies
>>>>>>>> the
>>>>>>>> > multiple
>>>>>>>> > > > >> > different ways things work in TP will only make it
>>>>>>>> easier to learn
>>>>>>>> > > and
>>>>>>>> > > > >> > adopt.
>>>>>>>> > > > >> >
>>>>>>>> > > > >> > I don't know that I have enough knowledge of the inner
>>>>>>>> workings of
>>>>>>>> > > > >> > transactions to know what/if this will cause problems.
>>>>>>>> > > > >> >
>>>>>>>> > > > >> > Dave
>>>>>>>> > > > >> >
>>>>>>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>>>>>>>> > > > >> > <sp...@gmail.com>
>>>>>>>> > > > >> > wrote:
>>>>>>>> > > > >> >
>>>>>>>> > > > >> > > We haven't touched "transactions" since TP3 was
>>>>>>>> originally
>>>>>>>> > > designed.
>>>>>>>> > > > >> > > They have remained a feature for embedded use cases
>>>>>>>> even in the
>>>>>>>> > > face
>>>>>>>> > > > >> > > of the
>>>>>>>> > > > >> > rise
>>>>>>>> > > > >> > > of remote graph use cases and result in major
>>>>>>>> inconsistencies
>>>>>>>> > that
>>>>>>>> > > > >> > > really bother users.
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was
>>>>>>>> a chance
>>>>>>>> > to
>>>>>>>> > > > >> > > do something radical about transactions. Basically,
>>>>>>>> I'd like
>>>>>>>> > > > >> > > transactions to work irrespective of remote or
>>>>>>>> embedded usage
>>>>>>>> > and
>>>>>>>> > > > >> > > for them to have the
>>>>>>>> > > > >> > same
>>>>>>>> > > > >> > > API when doing so. In mulling it over for the last day
>>>>>>>> or so I
>>>>>>>> > > had a
>>>>>>>> > > > >> > > realization that makes me believe that the following is
>>>>>>>> > possible:
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > g = traversal().withEmbedded(graph)
>>>>>>>> > > > >> > > // or
>>>>>>>> > > > >> > > g = traversal().withRemote(conn)
>>>>>>>> > > > >> > > gtx = g.tx().create()
>>>>>>>> > > > >> > > gtx.addV('person').iterate()
>>>>>>>> > > > >> > > gtx.addV('software').iterate()
>>>>>>>> > > > >> > > gtx.close() // alternatively you could explicitly
>>>>>>>> commit() or
>>>>>>>> > > > >> > > rollback()
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > // you could still use g for sessionless, but gtx is
>>>>>>>> "done"
>>>>>>>> > after
>>>>>>>> > > //
>>>>>>>> > > > >> > > you close so you will need to create() a new gtx
>>>>>>>> instance for a
>>>>>>>> > //
>>>>>>>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > Note that the create() method on tx() is the new bit of
>>>>>>>> > necessary
>>>>>>>> > > > >> syntax.
>>>>>>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and
>>>>>>>> you could
>>>>>>>> > > > >> > > skip it for thread-local auto-transactions) but all the
>>>>>>>> > > > >> > > documentation can be shifted so that the API is
>>>>>>>> identical to
>>>>>>>> > > remote.
>>>>>>>> > > > >> > > The change would be non-breaking as the embedded
>>>>>>>> transaction
>>>>>>>> > > > >> > > approach would remain as it is, but would no longer be
>>>>>>>> > documented
>>>>>>>> > > as
>>>>>>>> > > > >> > > the preferred approach. Perhaps we could one day
>>>>>>>> disallow it but
>>>>>>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not
>>>>>>>> sure we want
>>>>>>>> > to
>>>>>>>> > > > >> touch in TP3.
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > What happens behind the scenes of g.tx().create() is
>>>>>>>> that in
>>>>>>>> > > remote
>>>>>>>> > > > >> > > cases the RemoteConnection constructs a remote
>>>>>>>> Transaction
>>>>>>>> > > > >> > > implementation which basically is used to spawn new
>>>>>>>> traversal
>>>>>>>> > > source
>>>>>>>> > > > >> > > instances with a session based connections. The remote
>>>>>>>> > Transaction
>>>>>>>> > > > >> > > object can't support
>>>>>>>> > > > >> > transaction
>>>>>>>> > > > >> > > listeners or special read-write/close configurations,
>>>>>>>> but I
>>>>>>>> > don't
>>>>>>>> > > > >> > > think that's a problem as those things don't make a
>>>>>>>> lot of sense
>>>>>>>> > > in
>>>>>>>> > > > >> > > remote use cases.
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > From the server perspective, this change would also
>>>>>>>> mean that
>>>>>>>> > > > >> > > sessions would have to accept bytecode and not just
>>>>>>>> scripts,
>>>>>>>> > which
>>>>>>>> > > > >> > > technically shouldn't be a problem.
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > One downside at the moment is that i'm thinking mostly
>>>>>>>> in Java
>>>>>>>> > at
>>>>>>>> > > > >> > > the moment. I'm not sure how this all works in other
>>>>>>>> variants
>>>>>>>> > just
>>>>>>>> > > > >> > > yet, but
>>>>>>>> > > > >> > I'd
>>>>>>>> > > > >> > > hope to keep similar syntax.
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> > > I will keep experimenting tomorrow. If you have any
>>>>>>>> thoughts on
>>>>>>>> > > the
>>>>>>>> > > > >> > matter,
>>>>>>>> > > > >> > > I'd be happy to hear them. Thanks!
>>>>>>>> > > > >> > >
>>>>>>>> > > > >> >
>>>>>>>> > > > >>
>>>>>>>> > > > >>
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

I've opened a PR for the UnifiedChannelizer:

https://github.com/apache/tinkerpop/pull/1414

Over the weekend I passed 10s of millions of messages through the
UnifiedChannelizer and the server stayed quite stable throughout it all.
With all the tests passing nicely now in travis, docker and locally I think
this one is now ready for review.

On Fri, Apr 2, 2021 at 8:35 AM Stephen Mallette <sp...@gmail.com>
wrote:

> I've solved the Travis issue. Some tests simply required more threads in
> gremlinPool than the default which is set to the available cores. On Travis
> you probably get a machine with 2 cores available and some tests required 3
> or more to keep that many sessions open at once. Silly problem though the
> analysis did help uncover other issues so I guess travis did its job.
>
> I've merged the g.tx() work and rebased the UnifiedChannelizer branch and
> will continue with profiling/testing into early next week. I think that
> should set this up for a PR at that point. If you'd like to see what the
> work looks like, I think it would be fine to do so now at:
>
> https://github.com/apache/tinkerpop/compare/TINKERPOP-2245
>
>
>
>
> On Thu, Apr 1, 2021 at 11:57 AM Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> I've been fussing with Travis for a while now haven't had much luck
>> figuring out what is wrong. Decided to start profiling the server to see if
>> that provided any hints (and it was something I intended to do in any
>> case). Two issues came up:
>>
>> 1. shouldEventuallySucceedOnSameServerWithDefault() had some bad
>> assertion logic and it fails for both UnifiedChannelizer and for the old
>> OpProcessor stuff when the assertions are corrected. I've @Ignore'd that
>> test for now. No idea if it ever worked or if it's busted with some more
>> recent changes to the driver. I don't think that this was the cause of the
>> problems with travis but I can't be sure.
>> 2. AliasClusteredClient did not proxy calls to its underlying Client
>> instance. This was purposeful, but has bad implications for
>> DriverRemoteConnection. I'd now consider it a bug in 3.5.0 and I've changed
>> the behavior. Ultimately, the whole Client aliasing probably needs to be
>> changed, but it's too big a thing to try to get into now. Not completely
>> sure yet if this fixes travis either, but I sense it has something to do
>> with it.
>>
>> Travis is a bit backed up today so it's slow getting results. Hopefully
>> (2) solved the problem or at least leads to an error i can see.
>>
>> The nice side-effect of this profiling is that I've come to see how much
>> faster UnifiedChannelizer is as compared to the OpProcessor. Using the new
>> remote transaction approach to load 100k vertices and 50k edges in 50k
>> separate transactions (2 vertices/1 edge per tx), took under 3 minutes with
>> UnifiedChannelizer and the server stayed remarkably stable in terms of
>> resources/gc. The same workload on OpProcessor.........I gave up waiting
>> for it to finish after 1500 transactions, which took 9 minutes. Looking at
>> the jfr, there was a really weird gc pattern and memory usage under
>> OpProcessor which perhaps had some connection to our quickly
>> starting/killing threads and GremlinExecutor instances.
>>
>> I plan to keep testing/profiling for a bit longer before issuing the PR
>> (and hopefully getting travis fixed in the process)...more updates to come
>> i imagine.
>>
>> Divij, I'm glad that you brought this issue up early. I imagine I would
>> have noticed the problem in profiling, but it was smart to call it out
>> early. Nice job.
>>
>>
>>
>> On Mon, Mar 29, 2021 at 6:55 AM Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>>> I can't seem to get Travis to pass the build for this branch with the
>>> UnifiedChannelizer enabled.
>>>
>>> https://travis-ci.com/github/apache/tinkerpop/builds/221427334
>>>
>>> It just hangs without much information. I can't seem to get it to behave
>>> otherwise and it doesn't fail for me locally (of course). If anyone has
>>> some free CPU cycles could you please do:
>>>
>>> mvn clean install -DskipTests && mvn verify -pl :gremlin-server
>>> -DskipTests -DskipIntegrationTests=false -DincludeNeo4j -DtestUnified=true
>>>
>>> i'm curious if anyone else can recreate the issue.....Thanks
>>>
>>>
>>>
>>> On Thu, Mar 25, 2021 at 3:33 PM Stephen Mallette <sp...@gmail.com>
>>> wrote:
>>>
>>>> It took me a long time to get to the final steps of this change.
>>>> Uncovered some unexpected bugs in the WsAndHttpChannelizer and that was
>>>> gnarly to sort out. Also found some problems with authorization through the
>>>> SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these
>>>> combined channelizers really work. Seems like the new tangle of stuff in
>>>> Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at
>>>> this point. I might make some attempts on this body of work since it's
>>>> fresh in my mind but we'll see.
>>>>
>>>> After getting through these little nits I've managed to get all of the
>>>> Gremlin Server integration test cases passing with the new
>>>> UnifiedChannelizer. I've added a -DtestUnified property to run the
>>>> UnifiedChannelizer rather than the old stuff. This way we can add a new job
>>>> to travis to cover that by itself in parallel to the old. You can run it
>>>> like this:
>>>>
>>>> mvn verify -pl gremlin-server -DskipIntegrationTests=false
>>>> -DtestUnified=true
>>>>
>>>> Having all the tests pass on the new channelizer makes me feel pretty
>>>> confident that what I have at least works (and without much change in
>>>> behavior) and therefore have pushed a branch for early review:
>>>>
>>>> https://github.com/apache/tinkerpop/tree/TINKERPOP-2245
>>>>
>>>> I still need to write a few more tests and after that it would be time
>>>> to do a bit of performance testing to see how it compares to the old
>>>> method. I'm also missing "metrics". If all that is good then I'll need to
>>>> do a gang of documentation work. I imagine I will continue to keep this
>>>> work separate from the bytecode transaction work that spawned it and merge
>>>> the transaction stuff first followed by this. In that way, expect this
>>>> branch to be a bit volatile with rebasing.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <sp...@gmail.com>
>>>> wrote:
>>>>
>>>>> More reasonable progress with the UnifiedChannelizer. The short
>>>>> version is that I have a pretty major quantity of the server integration
>>>>> tests passing under this new Channelizer. The ones that are failing are
>>>>> typically ones that assert some specific log messages or exception message
>>>>> which may not necessarily apply anymore. I'm still mulling those over.
>>>>>
>>>>> The basic structure of the UnifiedChannelizer is that it sorta copies
>>>>> the WsAndHttpChannelizer and builds from there, so when this
>>>>> UnfiedChannelizer is configured you'll get HTTP and Websockets. The
>>>>> UnifiedChannelizer gets rid of the entire OpProcessor infrastructure. I
>>>>> think that was a good body of code years ago, but we've now learned quite
>>>>> explicitly what we want Gremlin Server to do and what role it plays.
>>>>> OpProcessor just feels like an unnecessary abstraction now. Removing it
>>>>> simplifies the code of the UnifiedChannelizer, which adds a single
>>>>> UnifiedHandler to the netty pipeline. That UnifiedHandler streamlines all
>>>>> those OpProcessor implementations into one so all the duplicate code is
>>>>> effectively removed by running RequestMessage instances through a single
>>>>> common flow. That means all the request validation, error handling,
>>>>> iterator processing, result building, transaction management, etc. will be
>>>>> in one place.
>>>>>
>>>>> The UnifiedHandler also kills out the GremlinExecutor. That means all
>>>>> the Lifecycle callbacks and random lambdas that generated so much
>>>>> indirection in the name of extensibility are no longer confusing the code
>>>>> base and the wicked hierarchy of timesouts and exception throws are all
>>>>> flattened.
>>>>>
>>>>> When the UnifiedHandler gets a RequestMessage it validates it, builds
>>>>> a Context and creates a Rexster instance (thought i'd bring back the name
>>>>> rather than go with the generic "Worker" interface) that has the Context
>>>>> added to it as a task. The Rexster instance is then submitted to the
>>>>> Gremlin thread pool to do work when a thread is available. Rexster will
>>>>> hang on to that thread awaiting tasks that are assigned to it. For
>>>>> sessionless requests, that means Rexster handles one request and exits. For
>>>>> a sessionful request it means the Rexster will wait for more tasks (Context
>>>>> objects from new requests) to be added to it. The UnifiedHandler holds
>>>>> those Rexster instances in a Map aligning session identifiers to Rexster
>>>>> instances. As Divij alluded to, I suppose this means we are putting down
>>>>> foundation for query cancellation and better insight into the queue of
>>>>> things that are running.
>>>>>
>>>>> Now that the gremlinPool server setting covers both types of requests
>>>>> we no longer have a situation where the number of threads that can be
>>>>> running is unbounded and a "session" will now look a bit like a "long run
>>>>> job" in that it will consume a thread from the pool for as long as the
>>>>> session is open. So if the gremlinPool is 16, you could have 16 sessions
>>>>> active. Sessionless requests would begin to queue as would requests for new
>>>>> sessions and they would be serviced in the order received.
>>>>>
>>>>> I'm still not quite ready for a pull request. Despite having the bulk
>>>>> of the integration tests working, the code isn't quite the way I think it
>>>>> will be finally organized (I used a lot of inner classes and things which
>>>>> probably should change). I also need to figure out how to run the tests. I
>>>>> think I need to run them all per channelizer (old and new) but I think that
>>>>> will push the build time over maximum on Travis, so I may need to break
>>>>> things up somehow. I'm reasonably sure I'll have a branch ready to push
>>>>> before the end of the week.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <sp...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I've started to play with some ideas for pushing all requests
>>>>>> scripts/bytecode sessions/sessionless down one unified pipeline with one
>>>>>> shared thread pool. The most interesting thing I've seen so far is how much
>>>>>> the code could potentially simplify under this approach. The
>>>>>> GremlinExecutor was the wrong abstraction in a sense and it forced way too
>>>>>> many callbacks as arguments and made error handling kinda sloppy which made
>>>>>> the server code harder to reuse, extend and read. Of course, all that was
>>>>>> built before the first remote graph was built and we were just wildly
>>>>>> guessing at how DS Graph was going to work with all this. Now, I think the
>>>>>> remote model is much more clear, so maybe all this can be made "right" now.
>>>>>>
>>>>>> I think that if I can see any success at all with this in the next
>>>>>> few days, we could include a new UnfiiedChannelizer that would replace the
>>>>>> existing ones. I think we'd keep the old one in place by default and keep
>>>>>> my copy/paste work for SessionOpProcessor that added bytecode there, with
>>>>>> the idea that the UnifiedChannelizer will become the future default as we
>>>>>> get it tested further along the 3.5.x line.
>>>>>>
>>>>>> I imagine I'll have more details on this task tomorrow and will post
>>>>>> back here when I do.
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <
>>>>>> divijvaidya13@gmail.com> wrote:
>>>>>>
>>>>>>> >
>>>>>>> > I suppose the
>>>>>>> > question is how would we ensure that each request for a session
>>>>>>> ends up
>>>>>>> > being executed on the same thread from the previous request if
>>>>>>> jobs are
>>>>>>> > randomly submitted to a worker pool?
>>>>>>>
>>>>>>>
>>>>>>> I haven't thought through the details, but on top of my head, we
>>>>>>> would have
>>>>>>> to maintain some request<->thread mapping on the server. This
>>>>>>> mapping is
>>>>>>> also a building block for a request cancellation feature in future
>>>>>>> where a
>>>>>>> client would be able to send a cancellation request to the server,
>>>>>>> the
>>>>>>> server will map the request to a thread executing that request and
>>>>>>> then set
>>>>>>> an interrupt on that thread to signify cancellation.
>>>>>>>
>>>>>>> Divij Vaidya
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <
>>>>>>> spmallette@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > I started a fresh thread for this topic Divij brought up, with more
>>>>>>> > context:
>>>>>>> >
>>>>>>> > > In a scenario where we have both
>>>>>>> > > session-less and sesion-ful requests being made to the server,
>>>>>>> the
>>>>>>> > resource
>>>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>>>> requests.
>>>>>>> > > This problem exists even today, hence not totally correlated to
>>>>>>> what you
>>>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>>>>> should
>>>>>>> > fix
>>>>>>> > > this problem first. A solution would entail converging the
>>>>>>> worker pool
>>>>>>> > for
>>>>>>> > > both session vs session-less requests.
>>>>>>> >
>>>>>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose
>>>>>>> the
>>>>>>> > question is how would we ensure that each request for a session
>>>>>>> ends up
>>>>>>> > being executed on the same thread from the previous request if
>>>>>>> jobs are
>>>>>>> > randomly submitted to a worker pool?
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <
>>>>>>> divijvaidya13@gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> > > Great idea Stephen. I agree with standardizing the explicit
>>>>>>> transaction
>>>>>>> > > semantics in cases of remote server (implicit aka sessionless is
>>>>>>> already
>>>>>>> > > self explanatory) and unifying the syntax with embedded graph
>>>>>>> usage.
>>>>>>> > >
>>>>>>> > > Couple of notes:
>>>>>>> > >
>>>>>>> > > 1. I would favor the begin() instead of create() as it's closer
>>>>>>> to
>>>>>>> > > universally prominent SQL transaction syntax. This would reduce
>>>>>>> the
>>>>>>> > > learning curve for users of TP.
>>>>>>> > >
>>>>>>> > > 2. From an implementation standpoint, sessionless requests are
>>>>>>> queued on
>>>>>>> > > the server side and are serviced by the worker thread pool. But a
>>>>>>> > > session-ful aka explicit transaction aka managed transaction
>>>>>>> starts a new
>>>>>>> > > single worked thread pool every time. In a scenario where we
>>>>>>> have both
>>>>>>> > > session-less and sesion-ful requests being made to the server,
>>>>>>> the
>>>>>>> > resource
>>>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>>>> requests.
>>>>>>> > > This problem exists even today, hence not totally correlated to
>>>>>>> what you
>>>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>>>>> should
>>>>>>> > fix
>>>>>>> > > this problem first. A solution would entail converging the
>>>>>>> worker pool
>>>>>>> > for
>>>>>>> > > both session vs session-less requests.
>>>>>>> > >
>>>>>>> > > 3. You are proposing the idea of having a transaction scoped
>>>>>>> traversal
>>>>>>> > > object. I agree with it but we need more clarification in
>>>>>>> behavior wrt
>>>>>>> > the
>>>>>>> > > following scenarios:
>>>>>>> > >
>>>>>>> > > Q. What happens when g.tx().commit() is called on a transaction
>>>>>>> scoped
>>>>>>> > > traversal object? Does the traversal get closed?
>>>>>>> > > Q. Currently, the same traversal object could be used to execute
>>>>>>> multiple
>>>>>>> > > session-less requests simultaneously in a thread safe manner.
>>>>>>> Does the
>>>>>>> > same
>>>>>>> > > behaviour apply to transaction scoped traversal? If not, then
>>>>>>> how do I
>>>>>>> > send
>>>>>>> > > multiple requests in parallel from the client all scoped to the
>>>>>>> same
>>>>>>> > > transaction on the server? Note that this is different from case
>>>>>>> of multi
>>>>>>> > > threaded transactions because on the server all requests are
>>>>>>> scoped to
>>>>>>> > > single transaction i.e. single thread but on the client they may
>>>>>>> be
>>>>>>> > > submitted via multiple threads.
>>>>>>> > >
>>>>>>> > > Regards,
>>>>>>> > > Divij Vaidya
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>>>>>>> spmallette@gmail.com>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > > > The Transaction object is a bit important with remote cases
>>>>>>> because it
>>>>>>> > > > holds the reference to the session. With embedded use cases we
>>>>>>> > generally
>>>>>>> > > > adhere to ThreadLocal transactions so the Transaction
>>>>>>> implementation in
>>>>>>> > > > that case is more of a controller for the current thread but
>>>>>>> for remote
>>>>>>> > > > cases the Transaction implementation holds a bit of state that
>>>>>>> can
>>>>>>> > cross
>>>>>>> > > > over threads. In some ways, that makes remote cases feel like a
>>>>>>> > "threaded
>>>>>>> > > > transaction" so that may be familiar to users?? Here's some
>>>>>>> example
>>>>>>> > > > syntax I currently have working in a test case:
>>>>>>> > > >
>>>>>>> > > > g = traversal().withRemote(conn)
>>>>>>> > > > gtx = g.tx().create()
>>>>>>> > > > assert gtx.isOpen() == true
>>>>>>> > > > gtx.addV('person').iterate()
>>>>>>> > > > gtx.addV('software').iterate()
>>>>>>> > > > gtx.commit() // alternatively you could explicitly rollback()
>>>>>>> > > > assert gtx.isOpen() == false
>>>>>>> > > >
>>>>>>> > > > I hope that documentation changes are enough to unify the
>>>>>>> syntax and
>>>>>>> > > remove
>>>>>>> > > > confusion despite there still being ThreadLocal transactions
>>>>>>> as a
>>>>>>> > default
>>>>>>> > > > for embedded cases and something else for remote. At least
>>>>>>> they will
>>>>>>> > look
>>>>>>> > > > the same, even if you technically don't need to do a
>>>>>>> g.tx().create()
>>>>>>> > for
>>>>>>> > > > embedded transactions and can just have the control you always
>>>>>>> had with
>>>>>>> > > > g.tx().commit() directly.
>>>>>>> > > >
>>>>>>> > > > Note that the remote Transaction object can support
>>>>>>> configuration via
>>>>>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from
>>>>>>> the default
>>>>>>> > > of
>>>>>>> > > > "commit on close" to rollback or manual (or i suppose something
>>>>>>> > custom).
>>>>>>> > > > It's nice that this piece works. I don't see a point in
>>>>>>> supporting
>>>>>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense
>>>>>>> in
>>>>>>> > remote
>>>>>>> > > > contexts.
>>>>>>> > > >
>>>>>>> > > > Another point is that I've added what I termed a "Graph
>>>>>>> Operation"
>>>>>>> > > bytecode
>>>>>>> > > > instances which is bytecode that isn't related to traversal
>>>>>>> > > construction. I
>>>>>>> > > > hope this isn't one of those things where we end up wanting to
>>>>>>> > deprecate
>>>>>>> > > an
>>>>>>> > > > idea as fast as we added it but we needed some way to pass
>>>>>>> > > commit/rollback
>>>>>>> > > > commands and they aren't really part of traversal
>>>>>>> construction. They
>>>>>>> > are
>>>>>>> > > > operations that execute on the graph itself and I couldn't
>>>>>>> think of how
>>>>>>> > > to
>>>>>>> > > > treat them as traversals nicely, so they sorta just exist as a
>>>>>>> one off.
>>>>>>> > > > Perhaps they will grow in number?? Folks have always asked if
>>>>>>> bytecode
>>>>>>> > > > requests could get "GraphFeature" information - I suppose this
>>>>>>> could
>>>>>>> > be a
>>>>>>> > > > way to do that??
>>>>>>> > > >
>>>>>>> > > > Anyway, I will keep going down this general path as it's
>>>>>>> appearing
>>>>>>> > > > relatively fruitful.
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>>>>>>> spmallette@gmail.com
>>>>>>> > >
>>>>>>> > > > wrote:
>>>>>>> > > >
>>>>>>> > > > > > We should just communicate clearly that a simple remote
>>>>>>> traversal
>>>>>>> > > > > already uses a transaction by default,
>>>>>>> > > > >
>>>>>>> > > > > That's a good point because even with this change we still
>>>>>>> have a
>>>>>>> > > > > situation where a single iterated remote traversal will
>>>>>>> generally
>>>>>>> > mean
>>>>>>> > > a
>>>>>>> > > > > server managed commit/rollback, but in embedded mode, we
>>>>>>> have an open
>>>>>>> > > > > transaction (for transaction enabled graphs - TinkerGraph
>>>>>>> will behave
>>>>>>> > > > more
>>>>>>> > > > > like a remote traversal, so more confusion there i guess).
>>>>>>> I'm not
>>>>>>> > sure
>>>>>>> > > > how
>>>>>>> > > > > to rectify that at this time except by way of documentation.
>>>>>>> > > > >
>>>>>>> > > > > The only thing I can think of is that the default for
>>>>>>> embedded would
>>>>>>> > > have
>>>>>>> > > > > to be auto-commit per traversal. In that way it would work
>>>>>>> like
>>>>>>> > remote
>>>>>>> > > > > traversals and for graphs that don't support transactions
>>>>>>> like
>>>>>>> > > > TinkerGraph.
>>>>>>> > > > > Of course, that's the kind of change that will break a lot
>>>>>>> of code.
>>>>>>> > > Maybe
>>>>>>> > > > > we just keep that idea for another version.
>>>>>>> > > > >
>>>>>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de>
>>>>>>> wrote:
>>>>>>> > > > >
>>>>>>> > > > >> I like the idea of adding transactions for remote
>>>>>>> traversals as they
>>>>>>> > > > >> close a gap in functionality that we currently have for
>>>>>>> remote
>>>>>>> > > > traversals.
>>>>>>> > > > >> We should just communicate clearly that a simple remote
>>>>>>> traversal
>>>>>>> > > > already
>>>>>>> > > > >> uses a transaction by default, meaning that the server will
>>>>>>> roll
>>>>>>> > back
>>>>>>> > > > any
>>>>>>> > > > >> changes if any exception occurs. Users often ask about how
>>>>>>> to do
>>>>>>> > > > >> transactions with a remote traversal because they don't
>>>>>>> know about
>>>>>>> > > this
>>>>>>> > > > and
>>>>>>> > > > >> I'm afraid that we might add even more confusion if we now
>>>>>>> add
>>>>>>> > > > transactions
>>>>>>> > > > >> for remote traversals. Hence why I think that we should
>>>>>>> communicate
>>>>>>> > > this
>>>>>>> > > > >> clearly when we add remote transactions.
>>>>>>> > > > >>
>>>>>>> > > > >> Getting this to work in the GLVs should be possible but
>>>>>>> will require
>>>>>>> > > > some
>>>>>>> > > > >> effort. I think we would have to introduce some
>>>>>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>>>>>>> traversal
>>>>>>> > > on
>>>>>>> > > > a
>>>>>>> > > > >> terminal step but saves it and then submits all saved
>>>>>>> traversals
>>>>>>> > > > together
>>>>>>> > > > >> on close().
>>>>>>> > > > >> This also means that we should directly add an async
>>>>>>> version of
>>>>>>> > > close(),
>>>>>>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>>>>>>> > > > >>
>>>>>>> > > > >> I also like create() better than traversal() because it
>>>>>>> would
>>>>>>> > confuse
>>>>>>> > > me
>>>>>>> > > > >> to first start a traversal with traversal() and then also
>>>>>>> start a
>>>>>>> > > > >> transaction with the same method.
>>>>>>> > > > >>
>>>>>>> > > > >> -----Ursprüngliche Nachricht-----
>>>>>>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>>>>>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>>>>>>> > > > >> An: dev@tinkerpop.apache.org
>>>>>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>>>>>>> > > > >>
>>>>>>> > > > >> One thing I should have noted about tx().create(). The
>>>>>>> create() very
>>>>>>> > > > well
>>>>>>> > > > >> could have been named traversal(), thus the previous
>>>>>>> example reading
>>>>>>> > > as:
>>>>>>> > > > >>
>>>>>>> > > > >> g = traversal().withEmbedded(graph)
>>>>>>> > > > >> // or
>>>>>>> > > > >> g = traversal().withRemote(conn)
>>>>>>> > > > >> gtx = g.tx().traversal()
>>>>>>> > > > >> gtx.addV('person').iterate()
>>>>>>> > > > >> gtx.addV('software').iterate()
>>>>>>> > > > >> gtx.close() // alternatively you could explicitly commit()
>>>>>>> or
>>>>>>> > > rollback()
>>>>>>> > > > >>
>>>>>>> > > > >> You're basically spawning a new GraphTraversalSource from
>>>>>>> the
>>>>>>> > > > Transaction
>>>>>>> > > > >> object rather than from a Graph or AnonymousTraversal. I
>>>>>>> chose
>>>>>>> > > create()
>>>>>>> > > > >> because it felt like this looked weird:
>>>>>>> > > > >>
>>>>>>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>>>>>>> > > > >>
>>>>>>> > > > >> which would be a weird thing to do I guess, but just seeing
>>>>>>> > > > "traversal()"
>>>>>>> > > > >> over and over seemed odd looking and I wanted to
>>>>>>> differentiate with
>>>>>>> > > the
>>>>>>> > > > >> Transaction object even though the methods are identical in
>>>>>>> what
>>>>>>> > they
>>>>>>> > > > do. I
>>>>>>> > > > >> suppose I also drew inspiration from:
>>>>>>> > > > >>
>>>>>>> > > > >> Transaction.createdThreadedTx()
>>>>>>> > > > >>
>>>>>>> > > > >> which I think we might consider deprecating. JanusGraph
>>>>>>> would simply
>>>>>>> > > > >> expose their own Transaction object in the future that has
>>>>>>> that
>>>>>>> > method
>>>>>>> > > > as I
>>>>>>> > > > >> imagine folks still use that feature. As far as I know, no
>>>>>>> other
>>>>>>> > graph
>>>>>>> > > > >> implements that functionality and my guess is that no one
>>>>>>> will
>>>>>>> > likely
>>>>>>> > > > do so
>>>>>>> > > > >> in the future.
>>>>>>> > > > >>
>>>>>>> > > > >> anyway, if you like traversal() better than create() or the
>>>>>>> other
>>>>>>> > way
>>>>>>> > > > >> around, please let me know
>>>>>>> > > > >>
>>>>>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>>>>>>> > dave@bechberger.com
>>>>>>> > > >
>>>>>>> > > > >> wrote:
>>>>>>> > > > >>
>>>>>>> > > > >> > I am in favor of this change as I any idea that unifies
>>>>>>> the
>>>>>>> > multiple
>>>>>>> > > > >> > different ways things work in TP will only make it easier
>>>>>>> to learn
>>>>>>> > > and
>>>>>>> > > > >> > adopt.
>>>>>>> > > > >> >
>>>>>>> > > > >> > I don't know that I have enough knowledge of the inner
>>>>>>> workings of
>>>>>>> > > > >> > transactions to know what/if this will cause problems.
>>>>>>> > > > >> >
>>>>>>> > > > >> > Dave
>>>>>>> > > > >> >
>>>>>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>>>>>>> > > > >> > <sp...@gmail.com>
>>>>>>> > > > >> > wrote:
>>>>>>> > > > >> >
>>>>>>> > > > >> > > We haven't touched "transactions" since TP3 was
>>>>>>> originally
>>>>>>> > > designed.
>>>>>>> > > > >> > > They have remained a feature for embedded use cases
>>>>>>> even in the
>>>>>>> > > face
>>>>>>> > > > >> > > of the
>>>>>>> > > > >> > rise
>>>>>>> > > > >> > > of remote graph use cases and result in major
>>>>>>> inconsistencies
>>>>>>> > that
>>>>>>> > > > >> > > really bother users.
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was
>>>>>>> a chance
>>>>>>> > to
>>>>>>> > > > >> > > do something radical about transactions. Basically, I'd
>>>>>>> like
>>>>>>> > > > >> > > transactions to work irrespective of remote or embedded
>>>>>>> usage
>>>>>>> > and
>>>>>>> > > > >> > > for them to have the
>>>>>>> > > > >> > same
>>>>>>> > > > >> > > API when doing so. In mulling it over for the last day
>>>>>>> or so I
>>>>>>> > > had a
>>>>>>> > > > >> > > realization that makes me believe that the following is
>>>>>>> > possible:
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > g = traversal().withEmbedded(graph)
>>>>>>> > > > >> > > // or
>>>>>>> > > > >> > > g = traversal().withRemote(conn)
>>>>>>> > > > >> > > gtx = g.tx().create()
>>>>>>> > > > >> > > gtx.addV('person').iterate()
>>>>>>> > > > >> > > gtx.addV('software').iterate()
>>>>>>> > > > >> > > gtx.close() // alternatively you could explicitly
>>>>>>> commit() or
>>>>>>> > > > >> > > rollback()
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > // you could still use g for sessionless, but gtx is
>>>>>>> "done"
>>>>>>> > after
>>>>>>> > > //
>>>>>>> > > > >> > > you close so you will need to create() a new gtx
>>>>>>> instance for a
>>>>>>> > //
>>>>>>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > Note that the create() method on tx() is the new bit of
>>>>>>> > necessary
>>>>>>> > > > >> syntax.
>>>>>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and
>>>>>>> you could
>>>>>>> > > > >> > > skip it for thread-local auto-transactions) but all the
>>>>>>> > > > >> > > documentation can be shifted so that the API is
>>>>>>> identical to
>>>>>>> > > remote.
>>>>>>> > > > >> > > The change would be non-breaking as the embedded
>>>>>>> transaction
>>>>>>> > > > >> > > approach would remain as it is, but would no longer be
>>>>>>> > documented
>>>>>>> > > as
>>>>>>> > > > >> > > the preferred approach. Perhaps we could one day
>>>>>>> disallow it but
>>>>>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure
>>>>>>> we want
>>>>>>> > to
>>>>>>> > > > >> touch in TP3.
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > What happens behind the scenes of g.tx().create() is
>>>>>>> that in
>>>>>>> > > remote
>>>>>>> > > > >> > > cases the RemoteConnection constructs a remote
>>>>>>> Transaction
>>>>>>> > > > >> > > implementation which basically is used to spawn new
>>>>>>> traversal
>>>>>>> > > source
>>>>>>> > > > >> > > instances with a session based connections. The remote
>>>>>>> > Transaction
>>>>>>> > > > >> > > object can't support
>>>>>>> > > > >> > transaction
>>>>>>> > > > >> > > listeners or special read-write/close configurations,
>>>>>>> but I
>>>>>>> > don't
>>>>>>> > > > >> > > think that's a problem as those things don't make a lot
>>>>>>> of sense
>>>>>>> > > in
>>>>>>> > > > >> > > remote use cases.
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > From the server perspective, this change would also
>>>>>>> mean that
>>>>>>> > > > >> > > sessions would have to accept bytecode and not just
>>>>>>> scripts,
>>>>>>> > which
>>>>>>> > > > >> > > technically shouldn't be a problem.
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > One downside at the moment is that i'm thinking mostly
>>>>>>> in Java
>>>>>>> > at
>>>>>>> > > > >> > > the moment. I'm not sure how this all works in other
>>>>>>> variants
>>>>>>> > just
>>>>>>> > > > >> > > yet, but
>>>>>>> > > > >> > I'd
>>>>>>> > > > >> > > hope to keep similar syntax.
>>>>>>> > > > >> > >
>>>>>>> > > > >> > > I will keep experimenting tomorrow. If you have any
>>>>>>> thoughts on
>>>>>>> > > the
>>>>>>> > > > >> > matter,
>>>>>>> > > > >> > > I'd be happy to hear them. Thanks!
>>>>>>> > > > >> > >
>>>>>>> > > > >> >
>>>>>>> > > > >>
>>>>>>> > > > >>
>>>>>>> > > >
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

I've solved the Travis issue. Some tests simply required more threads in
gremlinPool than the default which is set to the available cores. On Travis
you probably get a machine with 2 cores available and some tests required 3
or more to keep that many sessions open at once. Silly problem though the
analysis did help uncover other issues so I guess travis did its job.

I've merged the g.tx() work and rebased the UnifiedChannelizer branch and
will continue with profiling/testing into early next week. I think that
should set this up for a PR at that point. If you'd like to see what the
work looks like, I think it would be fine to do so now at:

https://github.com/apache/tinkerpop/compare/TINKERPOP-2245




On Thu, Apr 1, 2021 at 11:57 AM Stephen Mallette <sp...@gmail.com>
wrote:

> I've been fussing with Travis for a while now haven't had much luck
> figuring out what is wrong. Decided to start profiling the server to see if
> that provided any hints (and it was something I intended to do in any
> case). Two issues came up:
>
> 1. shouldEventuallySucceedOnSameServerWithDefault() had some bad assertion
> logic and it fails for both UnifiedChannelizer and for the old OpProcessor
> stuff when the assertions are corrected. I've @Ignore'd that test for now.
> No idea if it ever worked or if it's busted with some more recent changes
> to the driver. I don't think that this was the cause of the problems with
> travis but I can't be sure.
> 2. AliasClusteredClient did not proxy calls to its underlying Client
> instance. This was purposeful, but has bad implications for
> DriverRemoteConnection. I'd now consider it a bug in 3.5.0 and I've changed
> the behavior. Ultimately, the whole Client aliasing probably needs to be
> changed, but it's too big a thing to try to get into now. Not completely
> sure yet if this fixes travis either, but I sense it has something to do
> with it.
>
> Travis is a bit backed up today so it's slow getting results. Hopefully
> (2) solved the problem or at least leads to an error i can see.
>
> The nice side-effect of this profiling is that I've come to see how much
> faster UnifiedChannelizer is as compared to the OpProcessor. Using the new
> remote transaction approach to load 100k vertices and 50k edges in 50k
> separate transactions (2 vertices/1 edge per tx), took under 3 minutes with
> UnifiedChannelizer and the server stayed remarkably stable in terms of
> resources/gc. The same workload on OpProcessor.........I gave up waiting
> for it to finish after 1500 transactions, which took 9 minutes. Looking at
> the jfr, there was a really weird gc pattern and memory usage under
> OpProcessor which perhaps had some connection to our quickly
> starting/killing threads and GremlinExecutor instances.
>
> I plan to keep testing/profiling for a bit longer before issuing the PR
> (and hopefully getting travis fixed in the process)...more updates to come
> i imagine.
>
> Divij, I'm glad that you brought this issue up early. I imagine I would
> have noticed the problem in profiling, but it was smart to call it out
> early. Nice job.
>
>
>
> On Mon, Mar 29, 2021 at 6:55 AM Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> I can't seem to get Travis to pass the build for this branch with the
>> UnifiedChannelizer enabled.
>>
>> https://travis-ci.com/github/apache/tinkerpop/builds/221427334
>>
>> It just hangs without much information. I can't seem to get it to behave
>> otherwise and it doesn't fail for me locally (of course). If anyone has
>> some free CPU cycles could you please do:
>>
>> mvn clean install -DskipTests && mvn verify -pl :gremlin-server
>> -DskipTests -DskipIntegrationTests=false -DincludeNeo4j -DtestUnified=true
>>
>> i'm curious if anyone else can recreate the issue.....Thanks
>>
>>
>>
>> On Thu, Mar 25, 2021 at 3:33 PM Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>>> It took me a long time to get to the final steps of this change.
>>> Uncovered some unexpected bugs in the WsAndHttpChannelizer and that was
>>> gnarly to sort out. Also found some problems with authorization through the
>>> SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these
>>> combined channelizers really work. Seems like the new tangle of stuff in
>>> Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at
>>> this point. I might make some attempts on this body of work since it's
>>> fresh in my mind but we'll see.
>>>
>>> After getting through these little nits I've managed to get all of the
>>> Gremlin Server integration test cases passing with the new
>>> UnifiedChannelizer. I've added a -DtestUnified property to run the
>>> UnifiedChannelizer rather than the old stuff. This way we can add a new job
>>> to travis to cover that by itself in parallel to the old. You can run it
>>> like this:
>>>
>>> mvn verify -pl gremlin-server -DskipIntegrationTests=false
>>> -DtestUnified=true
>>>
>>> Having all the tests pass on the new channelizer makes me feel pretty
>>> confident that what I have at least works (and without much change in
>>> behavior) and therefore have pushed a branch for early review:
>>>
>>> https://github.com/apache/tinkerpop/tree/TINKERPOP-2245
>>>
>>> I still need to write a few more tests and after that it would be time
>>> to do a bit of performance testing to see how it compares to the old
>>> method. I'm also missing "metrics". If all that is good then I'll need to
>>> do a gang of documentation work. I imagine I will continue to keep this
>>> work separate from the bytecode transaction work that spawned it and merge
>>> the transaction stuff first followed by this. In that way, expect this
>>> branch to be a bit volatile with rebasing.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <sp...@gmail.com>
>>> wrote:
>>>
>>>> More reasonable progress with the UnifiedChannelizer. The short version
>>>> is that I have a pretty major quantity of the server integration tests
>>>> passing under this new Channelizer. The ones that are failing are typically
>>>> ones that assert some specific log messages or exception message which may
>>>> not necessarily apply anymore. I'm still mulling those over.
>>>>
>>>> The basic structure of the UnifiedChannelizer is that it sorta copies
>>>> the WsAndHttpChannelizer and builds from there, so when this
>>>> UnfiedChannelizer is configured you'll get HTTP and Websockets. The
>>>> UnifiedChannelizer gets rid of the entire OpProcessor infrastructure. I
>>>> think that was a good body of code years ago, but we've now learned quite
>>>> explicitly what we want Gremlin Server to do and what role it plays.
>>>> OpProcessor just feels like an unnecessary abstraction now. Removing it
>>>> simplifies the code of the UnifiedChannelizer, which adds a single
>>>> UnifiedHandler to the netty pipeline. That UnifiedHandler streamlines all
>>>> those OpProcessor implementations into one so all the duplicate code is
>>>> effectively removed by running RequestMessage instances through a single
>>>> common flow. That means all the request validation, error handling,
>>>> iterator processing, result building, transaction management, etc. will be
>>>> in one place.
>>>>
>>>> The UnifiedHandler also kills out the GremlinExecutor. That means all
>>>> the Lifecycle callbacks and random lambdas that generated so much
>>>> indirection in the name of extensibility are no longer confusing the code
>>>> base and the wicked hierarchy of timesouts and exception throws are all
>>>> flattened.
>>>>
>>>> When the UnifiedHandler gets a RequestMessage it validates it, builds a
>>>> Context and creates a Rexster instance (thought i'd bring back the name
>>>> rather than go with the generic "Worker" interface) that has the Context
>>>> added to it as a task. The Rexster instance is then submitted to the
>>>> Gremlin thread pool to do work when a thread is available. Rexster will
>>>> hang on to that thread awaiting tasks that are assigned to it. For
>>>> sessionless requests, that means Rexster handles one request and exits. For
>>>> a sessionful request it means the Rexster will wait for more tasks (Context
>>>> objects from new requests) to be added to it. The UnifiedHandler holds
>>>> those Rexster instances in a Map aligning session identifiers to Rexster
>>>> instances. As Divij alluded to, I suppose this means we are putting down
>>>> foundation for query cancellation and better insight into the queue of
>>>> things that are running.
>>>>
>>>> Now that the gremlinPool server setting covers both types of requests
>>>> we no longer have a situation where the number of threads that can be
>>>> running is unbounded and a "session" will now look a bit like a "long run
>>>> job" in that it will consume a thread from the pool for as long as the
>>>> session is open. So if the gremlinPool is 16, you could have 16 sessions
>>>> active. Sessionless requests would begin to queue as would requests for new
>>>> sessions and they would be serviced in the order received.
>>>>
>>>> I'm still not quite ready for a pull request. Despite having the bulk
>>>> of the integration tests working, the code isn't quite the way I think it
>>>> will be finally organized (I used a lot of inner classes and things which
>>>> probably should change). I also need to figure out how to run the tests. I
>>>> think I need to run them all per channelizer (old and new) but I think that
>>>> will push the build time over maximum on Travis, so I may need to break
>>>> things up somehow. I'm reasonably sure I'll have a branch ready to push
>>>> before the end of the week.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <sp...@gmail.com>
>>>> wrote:
>>>>
>>>>> I've started to play with some ideas for pushing all requests
>>>>> scripts/bytecode sessions/sessionless down one unified pipeline with one
>>>>> shared thread pool. The most interesting thing I've seen so far is how much
>>>>> the code could potentially simplify under this approach. The
>>>>> GremlinExecutor was the wrong abstraction in a sense and it forced way too
>>>>> many callbacks as arguments and made error handling kinda sloppy which made
>>>>> the server code harder to reuse, extend and read. Of course, all that was
>>>>> built before the first remote graph was built and we were just wildly
>>>>> guessing at how DS Graph was going to work with all this. Now, I think the
>>>>> remote model is much more clear, so maybe all this can be made "right" now.
>>>>>
>>>>> I think that if I can see any success at all with this in the next few
>>>>> days, we could include a new UnfiiedChannelizer that would replace the
>>>>> existing ones. I think we'd keep the old one in place by default and keep
>>>>> my copy/paste work for SessionOpProcessor that added bytecode there, with
>>>>> the idea that the UnifiedChannelizer will become the future default as we
>>>>> get it tested further along the 3.5.x line.
>>>>>
>>>>> I imagine I'll have more details on this task tomorrow and will post
>>>>> back here when I do.
>>>>>
>>>>>
>>>>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <di...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> >
>>>>>> > I suppose the
>>>>>> > question is how would we ensure that each request for a session
>>>>>> ends up
>>>>>> > being executed on the same thread from the previous request if jobs
>>>>>> are
>>>>>> > randomly submitted to a worker pool?
>>>>>>
>>>>>>
>>>>>> I haven't thought through the details, but on top of my head, we
>>>>>> would have
>>>>>> to maintain some request<->thread mapping on the server. This mapping
>>>>>> is
>>>>>> also a building block for a request cancellation feature in future
>>>>>> where a
>>>>>> client would be able to send a cancellation request to the server, the
>>>>>> server will map the request to a thread executing that request and
>>>>>> then set
>>>>>> an interrupt on that thread to signify cancellation.
>>>>>>
>>>>>> Divij Vaidya
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <
>>>>>> spmallette@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > I started a fresh thread for this topic Divij brought up, with more
>>>>>> > context:
>>>>>> >
>>>>>> > > In a scenario where we have both
>>>>>> > > session-less and sesion-ful requests being made to the server, the
>>>>>> > resource
>>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>>> requests.
>>>>>> > > This problem exists even today, hence not totally correlated to
>>>>>> what you
>>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>>>> should
>>>>>> > fix
>>>>>> > > this problem first. A solution would entail converging the worker
>>>>>> pool
>>>>>> > for
>>>>>> > > both session vs session-less requests.
>>>>>> >
>>>>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
>>>>>> > question is how would we ensure that each request for a session
>>>>>> ends up
>>>>>> > being executed on the same thread from the previous request if jobs
>>>>>> are
>>>>>> > randomly submitted to a worker pool?
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <
>>>>>> divijvaidya13@gmail.com>
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Great idea Stephen. I agree with standardizing the explicit
>>>>>> transaction
>>>>>> > > semantics in cases of remote server (implicit aka sessionless is
>>>>>> already
>>>>>> > > self explanatory) and unifying the syntax with embedded graph
>>>>>> usage.
>>>>>> > >
>>>>>> > > Couple of notes:
>>>>>> > >
>>>>>> > > 1. I would favor the begin() instead of create() as it's closer to
>>>>>> > > universally prominent SQL transaction syntax. This would reduce
>>>>>> the
>>>>>> > > learning curve for users of TP.
>>>>>> > >
>>>>>> > > 2. From an implementation standpoint, sessionless requests are
>>>>>> queued on
>>>>>> > > the server side and are serviced by the worker thread pool. But a
>>>>>> > > session-ful aka explicit transaction aka managed transaction
>>>>>> starts a new
>>>>>> > > single worked thread pool every time. In a scenario where we have
>>>>>> both
>>>>>> > > session-less and sesion-ful requests being made to the server, the
>>>>>> > resource
>>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>>> requests.
>>>>>> > > This problem exists even today, hence not totally correlated to
>>>>>> what you
>>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>>>> should
>>>>>> > fix
>>>>>> > > this problem first. A solution would entail converging the worker
>>>>>> pool
>>>>>> > for
>>>>>> > > both session vs session-less requests.
>>>>>> > >
>>>>>> > > 3. You are proposing the idea of having a transaction scoped
>>>>>> traversal
>>>>>> > > object. I agree with it but we need more clarification in
>>>>>> behavior wrt
>>>>>> > the
>>>>>> > > following scenarios:
>>>>>> > >
>>>>>> > > Q. What happens when g.tx().commit() is called on a transaction
>>>>>> scoped
>>>>>> > > traversal object? Does the traversal get closed?
>>>>>> > > Q. Currently, the same traversal object could be used to execute
>>>>>> multiple
>>>>>> > > session-less requests simultaneously in a thread safe manner.
>>>>>> Does the
>>>>>> > same
>>>>>> > > behaviour apply to transaction scoped traversal? If not, then how
>>>>>> do I
>>>>>> > send
>>>>>> > > multiple requests in parallel from the client all scoped to the
>>>>>> same
>>>>>> > > transaction on the server? Note that this is different from case
>>>>>> of multi
>>>>>> > > threaded transactions because on the server all requests are
>>>>>> scoped to
>>>>>> > > single transaction i.e. single thread but on the client they may
>>>>>> be
>>>>>> > > submitted via multiple threads.
>>>>>> > >
>>>>>> > > Regards,
>>>>>> > > Divij Vaidya
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>>>>>> spmallette@gmail.com>
>>>>>> > > wrote:
>>>>>> > >
>>>>>> > > > The Transaction object is a bit important with remote cases
>>>>>> because it
>>>>>> > > > holds the reference to the session. With embedded use cases we
>>>>>> > generally
>>>>>> > > > adhere to ThreadLocal transactions so the Transaction
>>>>>> implementation in
>>>>>> > > > that case is more of a controller for the current thread but
>>>>>> for remote
>>>>>> > > > cases the Transaction implementation holds a bit of state that
>>>>>> can
>>>>>> > cross
>>>>>> > > > over threads. In some ways, that makes remote cases feel like a
>>>>>> > "threaded
>>>>>> > > > transaction" so that may be familiar to users?? Here's some
>>>>>> example
>>>>>> > > > syntax I currently have working in a test case:
>>>>>> > > >
>>>>>> > > > g = traversal().withRemote(conn)
>>>>>> > > > gtx = g.tx().create()
>>>>>> > > > assert gtx.isOpen() == true
>>>>>> > > > gtx.addV('person').iterate()
>>>>>> > > > gtx.addV('software').iterate()
>>>>>> > > > gtx.commit() // alternatively you could explicitly rollback()
>>>>>> > > > assert gtx.isOpen() == false
>>>>>> > > >
>>>>>> > > > I hope that documentation changes are enough to unify the
>>>>>> syntax and
>>>>>> > > remove
>>>>>> > > > confusion despite there still being ThreadLocal transactions as
>>>>>> a
>>>>>> > default
>>>>>> > > > for embedded cases and something else for remote. At least they
>>>>>> will
>>>>>> > look
>>>>>> > > > the same, even if you technically don't need to do a
>>>>>> g.tx().create()
>>>>>> > for
>>>>>> > > > embedded transactions and can just have the control you always
>>>>>> had with
>>>>>> > > > g.tx().commit() directly.
>>>>>> > > >
>>>>>> > > > Note that the remote Transaction object can support
>>>>>> configuration via
>>>>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
>>>>>> default
>>>>>> > > of
>>>>>> > > > "commit on close" to rollback or manual (or i suppose something
>>>>>> > custom).
>>>>>> > > > It's nice that this piece works. I don't see a point in
>>>>>> supporting
>>>>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense
>>>>>> in
>>>>>> > remote
>>>>>> > > > contexts.
>>>>>> > > >
>>>>>> > > > Another point is that I've added what I termed a "Graph
>>>>>> Operation"
>>>>>> > > bytecode
>>>>>> > > > instances which is bytecode that isn't related to traversal
>>>>>> > > construction. I
>>>>>> > > > hope this isn't one of those things where we end up wanting to
>>>>>> > deprecate
>>>>>> > > an
>>>>>> > > > idea as fast as we added it but we needed some way to pass
>>>>>> > > commit/rollback
>>>>>> > > > commands and they aren't really part of traversal construction.
>>>>>> They
>>>>>> > are
>>>>>> > > > operations that execute on the graph itself and I couldn't
>>>>>> think of how
>>>>>> > > to
>>>>>> > > > treat them as traversals nicely, so they sorta just exist as a
>>>>>> one off.
>>>>>> > > > Perhaps they will grow in number?? Folks have always asked if
>>>>>> bytecode
>>>>>> > > > requests could get "GraphFeature" information - I suppose this
>>>>>> could
>>>>>> > be a
>>>>>> > > > way to do that??
>>>>>> > > >
>>>>>> > > > Anyway, I will keep going down this general path as it's
>>>>>> appearing
>>>>>> > > > relatively fruitful.
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>>>>>> spmallette@gmail.com
>>>>>> > >
>>>>>> > > > wrote:
>>>>>> > > >
>>>>>> > > > > > We should just communicate clearly that a simple remote
>>>>>> traversal
>>>>>> > > > > already uses a transaction by default,
>>>>>> > > > >
>>>>>> > > > > That's a good point because even with this change we still
>>>>>> have a
>>>>>> > > > > situation where a single iterated remote traversal will
>>>>>> generally
>>>>>> > mean
>>>>>> > > a
>>>>>> > > > > server managed commit/rollback, but in embedded mode, we have
>>>>>> an open
>>>>>> > > > > transaction (for transaction enabled graphs - TinkerGraph
>>>>>> will behave
>>>>>> > > > more
>>>>>> > > > > like a remote traversal, so more confusion there i guess).
>>>>>> I'm not
>>>>>> > sure
>>>>>> > > > how
>>>>>> > > > > to rectify that at this time except by way of documentation.
>>>>>> > > > >
>>>>>> > > > > The only thing I can think of is that the default for
>>>>>> embedded would
>>>>>> > > have
>>>>>> > > > > to be auto-commit per traversal. In that way it would work
>>>>>> like
>>>>>> > remote
>>>>>> > > > > traversals and for graphs that don't support transactions like
>>>>>> > > > TinkerGraph.
>>>>>> > > > > Of course, that's the kind of change that will break a lot of
>>>>>> code.
>>>>>> > > Maybe
>>>>>> > > > > we just keep that idea for another version.
>>>>>> > > > >
>>>>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de>
>>>>>> wrote:
>>>>>> > > > >
>>>>>> > > > >> I like the idea of adding transactions for remote traversals
>>>>>> as they
>>>>>> > > > >> close a gap in functionality that we currently have for
>>>>>> remote
>>>>>> > > > traversals.
>>>>>> > > > >> We should just communicate clearly that a simple remote
>>>>>> traversal
>>>>>> > > > already
>>>>>> > > > >> uses a transaction by default, meaning that the server will
>>>>>> roll
>>>>>> > back
>>>>>> > > > any
>>>>>> > > > >> changes if any exception occurs. Users often ask about how
>>>>>> to do
>>>>>> > > > >> transactions with a remote traversal because they don't know
>>>>>> about
>>>>>> > > this
>>>>>> > > > and
>>>>>> > > > >> I'm afraid that we might add even more confusion if we now
>>>>>> add
>>>>>> > > > transactions
>>>>>> > > > >> for remote traversals. Hence why I think that we should
>>>>>> communicate
>>>>>> > > this
>>>>>> > > > >> clearly when we add remote transactions.
>>>>>> > > > >>
>>>>>> > > > >> Getting this to work in the GLVs should be possible but will
>>>>>> require
>>>>>> > > > some
>>>>>> > > > >> effort. I think we would have to introduce some
>>>>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>>>>>> traversal
>>>>>> > > on
>>>>>> > > > a
>>>>>> > > > >> terminal step but saves it and then submits all saved
>>>>>> traversals
>>>>>> > > > together
>>>>>> > > > >> on close().
>>>>>> > > > >> This also means that we should directly add an async version
>>>>>> of
>>>>>> > > close(),
>>>>>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>>>>>> > > > >>
>>>>>> > > > >> I also like create() better than traversal() because it would
>>>>>> > confuse
>>>>>> > > me
>>>>>> > > > >> to first start a traversal with traversal() and then also
>>>>>> start a
>>>>>> > > > >> transaction with the same method.
>>>>>> > > > >>
>>>>>> > > > >> -----Ursprüngliche Nachricht-----
>>>>>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>>>>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>>>>>> > > > >> An: dev@tinkerpop.apache.org
>>>>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>>>>>> > > > >>
>>>>>> > > > >> One thing I should have noted about tx().create(). The
>>>>>> create() very
>>>>>> > > > well
>>>>>> > > > >> could have been named traversal(), thus the previous example
>>>>>> reading
>>>>>> > > as:
>>>>>> > > > >>
>>>>>> > > > >> g = traversal().withEmbedded(graph)
>>>>>> > > > >> // or
>>>>>> > > > >> g = traversal().withRemote(conn)
>>>>>> > > > >> gtx = g.tx().traversal()
>>>>>> > > > >> gtx.addV('person').iterate()
>>>>>> > > > >> gtx.addV('software').iterate()
>>>>>> > > > >> gtx.close() // alternatively you could explicitly commit() or
>>>>>> > > rollback()
>>>>>> > > > >>
>>>>>> > > > >> You're basically spawning a new GraphTraversalSource from the
>>>>>> > > > Transaction
>>>>>> > > > >> object rather than from a Graph or AnonymousTraversal. I
>>>>>> chose
>>>>>> > > create()
>>>>>> > > > >> because it felt like this looked weird:
>>>>>> > > > >>
>>>>>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>>>>>> > > > >>
>>>>>> > > > >> which would be a weird thing to do I guess, but just seeing
>>>>>> > > > "traversal()"
>>>>>> > > > >> over and over seemed odd looking and I wanted to
>>>>>> differentiate with
>>>>>> > > the
>>>>>> > > > >> Transaction object even though the methods are identical in
>>>>>> what
>>>>>> > they
>>>>>> > > > do. I
>>>>>> > > > >> suppose I also drew inspiration from:
>>>>>> > > > >>
>>>>>> > > > >> Transaction.createdThreadedTx()
>>>>>> > > > >>
>>>>>> > > > >> which I think we might consider deprecating. JanusGraph
>>>>>> would simply
>>>>>> > > > >> expose their own Transaction object in the future that has
>>>>>> that
>>>>>> > method
>>>>>> > > > as I
>>>>>> > > > >> imagine folks still use that feature. As far as I know, no
>>>>>> other
>>>>>> > graph
>>>>>> > > > >> implements that functionality and my guess is that no one
>>>>>> will
>>>>>> > likely
>>>>>> > > > do so
>>>>>> > > > >> in the future.
>>>>>> > > > >>
>>>>>> > > > >> anyway, if you like traversal() better than create() or the
>>>>>> other
>>>>>> > way
>>>>>> > > > >> around, please let me know
>>>>>> > > > >>
>>>>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>>>>>> > dave@bechberger.com
>>>>>> > > >
>>>>>> > > > >> wrote:
>>>>>> > > > >>
>>>>>> > > > >> > I am in favor of this change as I any idea that unifies the
>>>>>> > multiple
>>>>>> > > > >> > different ways things work in TP will only make it easier
>>>>>> to learn
>>>>>> > > and
>>>>>> > > > >> > adopt.
>>>>>> > > > >> >
>>>>>> > > > >> > I don't know that I have enough knowledge of the inner
>>>>>> workings of
>>>>>> > > > >> > transactions to know what/if this will cause problems.
>>>>>> > > > >> >
>>>>>> > > > >> > Dave
>>>>>> > > > >> >
>>>>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>>>>>> > > > >> > <sp...@gmail.com>
>>>>>> > > > >> > wrote:
>>>>>> > > > >> >
>>>>>> > > > >> > > We haven't touched "transactions" since TP3 was
>>>>>> originally
>>>>>> > > designed.
>>>>>> > > > >> > > They have remained a feature for embedded use cases even
>>>>>> in the
>>>>>> > > face
>>>>>> > > > >> > > of the
>>>>>> > > > >> > rise
>>>>>> > > > >> > > of remote graph use cases and result in major
>>>>>> inconsistencies
>>>>>> > that
>>>>>> > > > >> > > really bother users.
>>>>>> > > > >> > >
>>>>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a
>>>>>> chance
>>>>>> > to
>>>>>> > > > >> > > do something radical about transactions. Basically, I'd
>>>>>> like
>>>>>> > > > >> > > transactions to work irrespective of remote or embedded
>>>>>> usage
>>>>>> > and
>>>>>> > > > >> > > for them to have the
>>>>>> > > > >> > same
>>>>>> > > > >> > > API when doing so. In mulling it over for the last day
>>>>>> or so I
>>>>>> > > had a
>>>>>> > > > >> > > realization that makes me believe that the following is
>>>>>> > possible:
>>>>>> > > > >> > >
>>>>>> > > > >> > > g = traversal().withEmbedded(graph)
>>>>>> > > > >> > > // or
>>>>>> > > > >> > > g = traversal().withRemote(conn)
>>>>>> > > > >> > > gtx = g.tx().create()
>>>>>> > > > >> > > gtx.addV('person').iterate()
>>>>>> > > > >> > > gtx.addV('software').iterate()
>>>>>> > > > >> > > gtx.close() // alternatively you could explicitly
>>>>>> commit() or
>>>>>> > > > >> > > rollback()
>>>>>> > > > >> > >
>>>>>> > > > >> > > // you could still use g for sessionless, but gtx is
>>>>>> "done"
>>>>>> > after
>>>>>> > > //
>>>>>> > > > >> > > you close so you will need to create() a new gtx
>>>>>> instance for a
>>>>>> > //
>>>>>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>>>>>> > > > >> > >
>>>>>> > > > >> > > Note that the create() method on tx() is the new bit of
>>>>>> > necessary
>>>>>> > > > >> syntax.
>>>>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and
>>>>>> you could
>>>>>> > > > >> > > skip it for thread-local auto-transactions) but all the
>>>>>> > > > >> > > documentation can be shifted so that the API is
>>>>>> identical to
>>>>>> > > remote.
>>>>>> > > > >> > > The change would be non-breaking as the embedded
>>>>>> transaction
>>>>>> > > > >> > > approach would remain as it is, but would no longer be
>>>>>> > documented
>>>>>> > > as
>>>>>> > > > >> > > the preferred approach. Perhaps we could one day
>>>>>> disallow it but
>>>>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure
>>>>>> we want
>>>>>> > to
>>>>>> > > > >> touch in TP3.
>>>>>> > > > >> > >
>>>>>> > > > >> > > What happens behind the scenes of g.tx().create() is
>>>>>> that in
>>>>>> > > remote
>>>>>> > > > >> > > cases the RemoteConnection constructs a remote
>>>>>> Transaction
>>>>>> > > > >> > > implementation which basically is used to spawn new
>>>>>> traversal
>>>>>> > > source
>>>>>> > > > >> > > instances with a session based connections. The remote
>>>>>> > Transaction
>>>>>> > > > >> > > object can't support
>>>>>> > > > >> > transaction
>>>>>> > > > >> > > listeners or special read-write/close configurations,
>>>>>> but I
>>>>>> > don't
>>>>>> > > > >> > > think that's a problem as those things don't make a lot
>>>>>> of sense
>>>>>> > > in
>>>>>> > > > >> > > remote use cases.
>>>>>> > > > >> > >
>>>>>> > > > >> > > From the server perspective, this change would also mean
>>>>>> that
>>>>>> > > > >> > > sessions would have to accept bytecode and not just
>>>>>> scripts,
>>>>>> > which
>>>>>> > > > >> > > technically shouldn't be a problem.
>>>>>> > > > >> > >
>>>>>> > > > >> > > One downside at the moment is that i'm thinking mostly
>>>>>> in Java
>>>>>> > at
>>>>>> > > > >> > > the moment. I'm not sure how this all works in other
>>>>>> variants
>>>>>> > just
>>>>>> > > > >> > > yet, but
>>>>>> > > > >> > I'd
>>>>>> > > > >> > > hope to keep similar syntax.
>>>>>> > > > >> > >
>>>>>> > > > >> > > I will keep experimenting tomorrow. If you have any
>>>>>> thoughts on
>>>>>> > > the
>>>>>> > > > >> > matter,
>>>>>> > > > >> > > I'd be happy to hear them. Thanks!
>>>>>> > > > >> > >
>>>>>> > > > >> >
>>>>>> > > > >>
>>>>>> > > > >>
>>>>>> > > >
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

I've been fussing with Travis for a while now haven't had much luck
figuring out what is wrong. Decided to start profiling the server to see if
that provided any hints (and it was something I intended to do in any
case). Two issues came up:

1. shouldEventuallySucceedOnSameServerWithDefault() had some bad assertion
logic and it fails for both UnifiedChannelizer and for the old OpProcessor
stuff when the assertions are corrected. I've @Ignore'd that test for now.
No idea if it ever worked or if it's busted with some more recent changes
to the driver. I don't think that this was the cause of the problems with
travis but I can't be sure.
2. AliasClusteredClient did not proxy calls to its underlying Client
instance. This was purposeful, but has bad implications for
DriverRemoteConnection. I'd now consider it a bug in 3.5.0 and I've changed
the behavior. Ultimately, the whole Client aliasing probably needs to be
changed, but it's too big a thing to try to get into now. Not completely
sure yet if this fixes travis either, but I sense it has something to do
with it.

Travis is a bit backed up today so it's slow getting results. Hopefully (2)
solved the problem or at least leads to an error i can see.

The nice side-effect of this profiling is that I've come to see how much
faster UnifiedChannelizer is as compared to the OpProcessor. Using the new
remote transaction approach to load 100k vertices and 50k edges in 50k
separate transactions (2 vertices/1 edge per tx), took under 3 minutes with
UnifiedChannelizer and the server stayed remarkably stable in terms of
resources/gc. The same workload on OpProcessor.........I gave up waiting
for it to finish after 1500 transactions, which took 9 minutes. Looking at
the jfr, there was a really weird gc pattern and memory usage under
OpProcessor which perhaps had some connection to our quickly
starting/killing threads and GremlinExecutor instances.

I plan to keep testing/profiling for a bit longer before issuing the PR
(and hopefully getting travis fixed in the process)...more updates to come
i imagine.

Divij, I'm glad that you brought this issue up early. I imagine I would
have noticed the problem in profiling, but it was smart to call it out
early. Nice job.



On Mon, Mar 29, 2021 at 6:55 AM Stephen Mallette <sp...@gmail.com>
wrote:

> I can't seem to get Travis to pass the build for this branch with the
> UnifiedChannelizer enabled.
>
> https://travis-ci.com/github/apache/tinkerpop/builds/221427334
>
> It just hangs without much information. I can't seem to get it to behave
> otherwise and it doesn't fail for me locally (of course). If anyone has
> some free CPU cycles could you please do:
>
> mvn clean install -DskipTests && mvn verify -pl :gremlin-server
> -DskipTests -DskipIntegrationTests=false -DincludeNeo4j -DtestUnified=true
>
> i'm curious if anyone else can recreate the issue.....Thanks
>
>
>
> On Thu, Mar 25, 2021 at 3:33 PM Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> It took me a long time to get to the final steps of this change.
>> Uncovered some unexpected bugs in the WsAndHttpChannelizer and that was
>> gnarly to sort out. Also found some problems with authorization through the
>> SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these
>> combined channelizers really work. Seems like the new tangle of stuff in
>> Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at
>> this point. I might make some attempts on this body of work since it's
>> fresh in my mind but we'll see.
>>
>> After getting through these little nits I've managed to get all of the
>> Gremlin Server integration test cases passing with the new
>> UnifiedChannelizer. I've added a -DtestUnified property to run the
>> UnifiedChannelizer rather than the old stuff. This way we can add a new job
>> to travis to cover that by itself in parallel to the old. You can run it
>> like this:
>>
>> mvn verify -pl gremlin-server -DskipIntegrationTests=false
>> -DtestUnified=true
>>
>> Having all the tests pass on the new channelizer makes me feel pretty
>> confident that what I have at least works (and without much change in
>> behavior) and therefore have pushed a branch for early review:
>>
>> https://github.com/apache/tinkerpop/tree/TINKERPOP-2245
>>
>> I still need to write a few more tests and after that it would be time to
>> do a bit of performance testing to see how it compares to the old method.
>> I'm also missing "metrics". If all that is good then I'll need to do a gang
>> of documentation work. I imagine I will continue to keep this work separate
>> from the bytecode transaction work that spawned it and merge the
>> transaction stuff first followed by this. In that way, expect this branch
>> to be a bit volatile with rebasing.
>>
>>
>>
>>
>>
>> On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>>> More reasonable progress with the UnifiedChannelizer. The short version
>>> is that I have a pretty major quantity of the server integration tests
>>> passing under this new Channelizer. The ones that are failing are typically
>>> ones that assert some specific log messages or exception message which may
>>> not necessarily apply anymore. I'm still mulling those over.
>>>
>>> The basic structure of the UnifiedChannelizer is that it sorta copies
>>> the WsAndHttpChannelizer and builds from there, so when this
>>> UnfiedChannelizer is configured you'll get HTTP and Websockets. The
>>> UnifiedChannelizer gets rid of the entire OpProcessor infrastructure. I
>>> think that was a good body of code years ago, but we've now learned quite
>>> explicitly what we want Gremlin Server to do and what role it plays.
>>> OpProcessor just feels like an unnecessary abstraction now. Removing it
>>> simplifies the code of the UnifiedChannelizer, which adds a single
>>> UnifiedHandler to the netty pipeline. That UnifiedHandler streamlines all
>>> those OpProcessor implementations into one so all the duplicate code is
>>> effectively removed by running RequestMessage instances through a single
>>> common flow. That means all the request validation, error handling,
>>> iterator processing, result building, transaction management, etc. will be
>>> in one place.
>>>
>>> The UnifiedHandler also kills out the GremlinExecutor. That means all
>>> the Lifecycle callbacks and random lambdas that generated so much
>>> indirection in the name of extensibility are no longer confusing the code
>>> base and the wicked hierarchy of timesouts and exception throws are all
>>> flattened.
>>>
>>> When the UnifiedHandler gets a RequestMessage it validates it, builds a
>>> Context and creates a Rexster instance (thought i'd bring back the name
>>> rather than go with the generic "Worker" interface) that has the Context
>>> added to it as a task. The Rexster instance is then submitted to the
>>> Gremlin thread pool to do work when a thread is available. Rexster will
>>> hang on to that thread awaiting tasks that are assigned to it. For
>>> sessionless requests, that means Rexster handles one request and exits. For
>>> a sessionful request it means the Rexster will wait for more tasks (Context
>>> objects from new requests) to be added to it. The UnifiedHandler holds
>>> those Rexster instances in a Map aligning session identifiers to Rexster
>>> instances. As Divij alluded to, I suppose this means we are putting down
>>> foundation for query cancellation and better insight into the queue of
>>> things that are running.
>>>
>>> Now that the gremlinPool server setting covers both types of requests we
>>> no longer have a situation where the number of threads that can be running
>>> is unbounded and a "session" will now look a bit like a "long run job" in
>>> that it will consume a thread from the pool for as long as the session is
>>> open. So if the gremlinPool is 16, you could have 16 sessions active.
>>> Sessionless requests would begin to queue as would requests for new
>>> sessions and they would be serviced in the order received.
>>>
>>> I'm still not quite ready for a pull request. Despite having the bulk of
>>> the integration tests working, the code isn't quite the way I think it will
>>> be finally organized (I used a lot of inner classes and things which
>>> probably should change). I also need to figure out how to run the tests. I
>>> think I need to run them all per channelizer (old and new) but I think that
>>> will push the build time over maximum on Travis, so I may need to break
>>> things up somehow. I'm reasonably sure I'll have a branch ready to push
>>> before the end of the week.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <sp...@gmail.com>
>>> wrote:
>>>
>>>> I've started to play with some ideas for pushing all requests
>>>> scripts/bytecode sessions/sessionless down one unified pipeline with one
>>>> shared thread pool. The most interesting thing I've seen so far is how much
>>>> the code could potentially simplify under this approach. The
>>>> GremlinExecutor was the wrong abstraction in a sense and it forced way too
>>>> many callbacks as arguments and made error handling kinda sloppy which made
>>>> the server code harder to reuse, extend and read. Of course, all that was
>>>> built before the first remote graph was built and we were just wildly
>>>> guessing at how DS Graph was going to work with all this. Now, I think the
>>>> remote model is much more clear, so maybe all this can be made "right" now.
>>>>
>>>> I think that if I can see any success at all with this in the next few
>>>> days, we could include a new UnfiiedChannelizer that would replace the
>>>> existing ones. I think we'd keep the old one in place by default and keep
>>>> my copy/paste work for SessionOpProcessor that added bytecode there, with
>>>> the idea that the UnifiedChannelizer will become the future default as we
>>>> get it tested further along the 3.5.x line.
>>>>
>>>> I imagine I'll have more details on this task tomorrow and will post
>>>> back here when I do.
>>>>
>>>>
>>>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <di...@gmail.com>
>>>> wrote:
>>>>
>>>>> >
>>>>> > I suppose the
>>>>> > question is how would we ensure that each request for a session ends
>>>>> up
>>>>> > being executed on the same thread from the previous request if jobs
>>>>> are
>>>>> > randomly submitted to a worker pool?
>>>>>
>>>>>
>>>>> I haven't thought through the details, but on top of my head, we would
>>>>> have
>>>>> to maintain some request<->thread mapping on the server. This mapping
>>>>> is
>>>>> also a building block for a request cancellation feature in future
>>>>> where a
>>>>> client would be able to send a cancellation request to the server, the
>>>>> server will map the request to a thread executing that request and
>>>>> then set
>>>>> an interrupt on that thread to signify cancellation.
>>>>>
>>>>> Divij Vaidya
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <spmallette@gmail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>> > I started a fresh thread for this topic Divij brought up, with more
>>>>> > context:
>>>>> >
>>>>> > > In a scenario where we have both
>>>>> > > session-less and sesion-ful requests being made to the server, the
>>>>> > resource
>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>> requests.
>>>>> > > This problem exists even today, hence not totally correlated to
>>>>> what you
>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>>> should
>>>>> > fix
>>>>> > > this problem first. A solution would entail converging the worker
>>>>> pool
>>>>> > for
>>>>> > > both session vs session-less requests.
>>>>> >
>>>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
>>>>> > question is how would we ensure that each request for a session ends
>>>>> up
>>>>> > being executed on the same thread from the previous request if jobs
>>>>> are
>>>>> > randomly submitted to a worker pool?
>>>>> >
>>>>> >
>>>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <
>>>>> divijvaidya13@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > Great idea Stephen. I agree with standardizing the explicit
>>>>> transaction
>>>>> > > semantics in cases of remote server (implicit aka sessionless is
>>>>> already
>>>>> > > self explanatory) and unifying the syntax with embedded graph
>>>>> usage.
>>>>> > >
>>>>> > > Couple of notes:
>>>>> > >
>>>>> > > 1. I would favor the begin() instead of create() as it's closer to
>>>>> > > universally prominent SQL transaction syntax. This would reduce the
>>>>> > > learning curve for users of TP.
>>>>> > >
>>>>> > > 2. From an implementation standpoint, sessionless requests are
>>>>> queued on
>>>>> > > the server side and are serviced by the worker thread pool. But a
>>>>> > > session-ful aka explicit transaction aka managed transaction
>>>>> starts a new
>>>>> > > single worked thread pool every time. In a scenario where we have
>>>>> both
>>>>> > > session-less and sesion-ful requests being made to the server, the
>>>>> > resource
>>>>> > > allocation will not be fair and may lead to starvation for some
>>>>> requests.
>>>>> > > This problem exists even today, hence not totally correlated to
>>>>> what you
>>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>>> should
>>>>> > fix
>>>>> > > this problem first. A solution would entail converging the worker
>>>>> pool
>>>>> > for
>>>>> > > both session vs session-less requests.
>>>>> > >
>>>>> > > 3. You are proposing the idea of having a transaction scoped
>>>>> traversal
>>>>> > > object. I agree with it but we need more clarification in behavior
>>>>> wrt
>>>>> > the
>>>>> > > following scenarios:
>>>>> > >
>>>>> > > Q. What happens when g.tx().commit() is called on a transaction
>>>>> scoped
>>>>> > > traversal object? Does the traversal get closed?
>>>>> > > Q. Currently, the same traversal object could be used to execute
>>>>> multiple
>>>>> > > session-less requests simultaneously in a thread safe manner. Does
>>>>> the
>>>>> > same
>>>>> > > behaviour apply to transaction scoped traversal? If not, then how
>>>>> do I
>>>>> > send
>>>>> > > multiple requests in parallel from the client all scoped to the
>>>>> same
>>>>> > > transaction on the server? Note that this is different from case
>>>>> of multi
>>>>> > > threaded transactions because on the server all requests are
>>>>> scoped to
>>>>> > > single transaction i.e. single thread but on the client they may be
>>>>> > > submitted via multiple threads.
>>>>> > >
>>>>> > > Regards,
>>>>> > > Divij Vaidya
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>>>>> spmallette@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > The Transaction object is a bit important with remote cases
>>>>> because it
>>>>> > > > holds the reference to the session. With embedded use cases we
>>>>> > generally
>>>>> > > > adhere to ThreadLocal transactions so the Transaction
>>>>> implementation in
>>>>> > > > that case is more of a controller for the current thread but for
>>>>> remote
>>>>> > > > cases the Transaction implementation holds a bit of state that
>>>>> can
>>>>> > cross
>>>>> > > > over threads. In some ways, that makes remote cases feel like a
>>>>> > "threaded
>>>>> > > > transaction" so that may be familiar to users?? Here's some
>>>>> example
>>>>> > > > syntax I currently have working in a test case:
>>>>> > > >
>>>>> > > > g = traversal().withRemote(conn)
>>>>> > > > gtx = g.tx().create()
>>>>> > > > assert gtx.isOpen() == true
>>>>> > > > gtx.addV('person').iterate()
>>>>> > > > gtx.addV('software').iterate()
>>>>> > > > gtx.commit() // alternatively you could explicitly rollback()
>>>>> > > > assert gtx.isOpen() == false
>>>>> > > >
>>>>> > > > I hope that documentation changes are enough to unify the syntax
>>>>> and
>>>>> > > remove
>>>>> > > > confusion despite there still being ThreadLocal transactions as a
>>>>> > default
>>>>> > > > for embedded cases and something else for remote. At least they
>>>>> will
>>>>> > look
>>>>> > > > the same, even if you technically don't need to do a
>>>>> g.tx().create()
>>>>> > for
>>>>> > > > embedded transactions and can just have the control you always
>>>>> had with
>>>>> > > > g.tx().commit() directly.
>>>>> > > >
>>>>> > > > Note that the remote Transaction object can support
>>>>> configuration via
>>>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
>>>>> default
>>>>> > > of
>>>>> > > > "commit on close" to rollback or manual (or i suppose something
>>>>> > custom).
>>>>> > > > It's nice that this piece works. I don't see a point in
>>>>> supporting
>>>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
>>>>> > remote
>>>>> > > > contexts.
>>>>> > > >
>>>>> > > > Another point is that I've added what I termed a "Graph
>>>>> Operation"
>>>>> > > bytecode
>>>>> > > > instances which is bytecode that isn't related to traversal
>>>>> > > construction. I
>>>>> > > > hope this isn't one of those things where we end up wanting to
>>>>> > deprecate
>>>>> > > an
>>>>> > > > idea as fast as we added it but we needed some way to pass
>>>>> > > commit/rollback
>>>>> > > > commands and they aren't really part of traversal construction.
>>>>> They
>>>>> > are
>>>>> > > > operations that execute on the graph itself and I couldn't think
>>>>> of how
>>>>> > > to
>>>>> > > > treat them as traversals nicely, so they sorta just exist as a
>>>>> one off.
>>>>> > > > Perhaps they will grow in number?? Folks have always asked if
>>>>> bytecode
>>>>> > > > requests could get "GraphFeature" information - I suppose this
>>>>> could
>>>>> > be a
>>>>> > > > way to do that??
>>>>> > > >
>>>>> > > > Anyway, I will keep going down this general path as it's
>>>>> appearing
>>>>> > > > relatively fruitful.
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>>>>> spmallette@gmail.com
>>>>> > >
>>>>> > > > wrote:
>>>>> > > >
>>>>> > > > > > We should just communicate clearly that a simple remote
>>>>> traversal
>>>>> > > > > already uses a transaction by default,
>>>>> > > > >
>>>>> > > > > That's a good point because even with this change we still
>>>>> have a
>>>>> > > > > situation where a single iterated remote traversal will
>>>>> generally
>>>>> > mean
>>>>> > > a
>>>>> > > > > server managed commit/rollback, but in embedded mode, we have
>>>>> an open
>>>>> > > > > transaction (for transaction enabled graphs - TinkerGraph will
>>>>> behave
>>>>> > > > more
>>>>> > > > > like a remote traversal, so more confusion there i guess). I'm
>>>>> not
>>>>> > sure
>>>>> > > > how
>>>>> > > > > to rectify that at this time except by way of documentation.
>>>>> > > > >
>>>>> > > > > The only thing I can think of is that the default for embedded
>>>>> would
>>>>> > > have
>>>>> > > > > to be auto-commit per traversal. In that way it would work like
>>>>> > remote
>>>>> > > > > traversals and for graphs that don't support transactions like
>>>>> > > > TinkerGraph.
>>>>> > > > > Of course, that's the kind of change that will break a lot of
>>>>> code.
>>>>> > > Maybe
>>>>> > > > > we just keep that idea for another version.
>>>>> > > > >
>>>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de>
>>>>> wrote:
>>>>> > > > >
>>>>> > > > >> I like the idea of adding transactions for remote traversals
>>>>> as they
>>>>> > > > >> close a gap in functionality that we currently have for remote
>>>>> > > > traversals.
>>>>> > > > >> We should just communicate clearly that a simple remote
>>>>> traversal
>>>>> > > > already
>>>>> > > > >> uses a transaction by default, meaning that the server will
>>>>> roll
>>>>> > back
>>>>> > > > any
>>>>> > > > >> changes if any exception occurs. Users often ask about how to
>>>>> do
>>>>> > > > >> transactions with a remote traversal because they don't know
>>>>> about
>>>>> > > this
>>>>> > > > and
>>>>> > > > >> I'm afraid that we might add even more confusion if we now add
>>>>> > > > transactions
>>>>> > > > >> for remote traversals. Hence why I think that we should
>>>>> communicate
>>>>> > > this
>>>>> > > > >> clearly when we add remote transactions.
>>>>> > > > >>
>>>>> > > > >> Getting this to work in the GLVs should be possible but will
>>>>> require
>>>>> > > > some
>>>>> > > > >> effort. I think we would have to introduce some
>>>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>>>>> traversal
>>>>> > > on
>>>>> > > > a
>>>>> > > > >> terminal step but saves it and then submits all saved
>>>>> traversals
>>>>> > > > together
>>>>> > > > >> on close().
>>>>> > > > >> This also means that we should directly add an async version
>>>>> of
>>>>> > > close(),
>>>>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>>>>> > > > >>
>>>>> > > > >> I also like create() better than traversal() because it would
>>>>> > confuse
>>>>> > > me
>>>>> > > > >> to first start a traversal with traversal() and then also
>>>>> start a
>>>>> > > > >> transaction with the same method.
>>>>> > > > >>
>>>>> > > > >> -----Ursprüngliche Nachricht-----
>>>>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>>>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>>>>> > > > >> An: dev@tinkerpop.apache.org
>>>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>>>>> > > > >>
>>>>> > > > >> One thing I should have noted about tx().create(). The
>>>>> create() very
>>>>> > > > well
>>>>> > > > >> could have been named traversal(), thus the previous example
>>>>> reading
>>>>> > > as:
>>>>> > > > >>
>>>>> > > > >> g = traversal().withEmbedded(graph)
>>>>> > > > >> // or
>>>>> > > > >> g = traversal().withRemote(conn)
>>>>> > > > >> gtx = g.tx().traversal()
>>>>> > > > >> gtx.addV('person').iterate()
>>>>> > > > >> gtx.addV('software').iterate()
>>>>> > > > >> gtx.close() // alternatively you could explicitly commit() or
>>>>> > > rollback()
>>>>> > > > >>
>>>>> > > > >> You're basically spawning a new GraphTraversalSource from the
>>>>> > > > Transaction
>>>>> > > > >> object rather than from a Graph or AnonymousTraversal. I chose
>>>>> > > create()
>>>>> > > > >> because it felt like this looked weird:
>>>>> > > > >>
>>>>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>>>>> > > > >>
>>>>> > > > >> which would be a weird thing to do I guess, but just seeing
>>>>> > > > "traversal()"
>>>>> > > > >> over and over seemed odd looking and I wanted to
>>>>> differentiate with
>>>>> > > the
>>>>> > > > >> Transaction object even though the methods are identical in
>>>>> what
>>>>> > they
>>>>> > > > do. I
>>>>> > > > >> suppose I also drew inspiration from:
>>>>> > > > >>
>>>>> > > > >> Transaction.createdThreadedTx()
>>>>> > > > >>
>>>>> > > > >> which I think we might consider deprecating. JanusGraph would
>>>>> simply
>>>>> > > > >> expose their own Transaction object in the future that has
>>>>> that
>>>>> > method
>>>>> > > > as I
>>>>> > > > >> imagine folks still use that feature. As far as I know, no
>>>>> other
>>>>> > graph
>>>>> > > > >> implements that functionality and my guess is that no one will
>>>>> > likely
>>>>> > > > do so
>>>>> > > > >> in the future.
>>>>> > > > >>
>>>>> > > > >> anyway, if you like traversal() better than create() or the
>>>>> other
>>>>> > way
>>>>> > > > >> around, please let me know
>>>>> > > > >>
>>>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>>>>> > dave@bechberger.com
>>>>> > > >
>>>>> > > > >> wrote:
>>>>> > > > >>
>>>>> > > > >> > I am in favor of this change as I any idea that unifies the
>>>>> > multiple
>>>>> > > > >> > different ways things work in TP will only make it easier
>>>>> to learn
>>>>> > > and
>>>>> > > > >> > adopt.
>>>>> > > > >> >
>>>>> > > > >> > I don't know that I have enough knowledge of the inner
>>>>> workings of
>>>>> > > > >> > transactions to know what/if this will cause problems.
>>>>> > > > >> >
>>>>> > > > >> > Dave
>>>>> > > > >> >
>>>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>>>>> > > > >> > <sp...@gmail.com>
>>>>> > > > >> > wrote:
>>>>> > > > >> >
>>>>> > > > >> > > We haven't touched "transactions" since TP3 was originally
>>>>> > > designed.
>>>>> > > > >> > > They have remained a feature for embedded use cases even
>>>>> in the
>>>>> > > face
>>>>> > > > >> > > of the
>>>>> > > > >> > rise
>>>>> > > > >> > > of remote graph use cases and result in major
>>>>> inconsistencies
>>>>> > that
>>>>> > > > >> > > really bother users.
>>>>> > > > >> > >
>>>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a
>>>>> chance
>>>>> > to
>>>>> > > > >> > > do something radical about transactions. Basically, I'd
>>>>> like
>>>>> > > > >> > > transactions to work irrespective of remote or embedded
>>>>> usage
>>>>> > and
>>>>> > > > >> > > for them to have the
>>>>> > > > >> > same
>>>>> > > > >> > > API when doing so. In mulling it over for the last day or
>>>>> so I
>>>>> > > had a
>>>>> > > > >> > > realization that makes me believe that the following is
>>>>> > possible:
>>>>> > > > >> > >
>>>>> > > > >> > > g = traversal().withEmbedded(graph)
>>>>> > > > >> > > // or
>>>>> > > > >> > > g = traversal().withRemote(conn)
>>>>> > > > >> > > gtx = g.tx().create()
>>>>> > > > >> > > gtx.addV('person').iterate()
>>>>> > > > >> > > gtx.addV('software').iterate()
>>>>> > > > >> > > gtx.close() // alternatively you could explicitly
>>>>> commit() or
>>>>> > > > >> > > rollback()
>>>>> > > > >> > >
>>>>> > > > >> > > // you could still use g for sessionless, but gtx is
>>>>> "done"
>>>>> > after
>>>>> > > //
>>>>> > > > >> > > you close so you will need to create() a new gtx instance
>>>>> for a
>>>>> > //
>>>>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>>>>> > > > >> > >
>>>>> > > > >> > > Note that the create() method on tx() is the new bit of
>>>>> > necessary
>>>>> > > > >> syntax.
>>>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and
>>>>> you could
>>>>> > > > >> > > skip it for thread-local auto-transactions) but all the
>>>>> > > > >> > > documentation can be shifted so that the API is identical
>>>>> to
>>>>> > > remote.
>>>>> > > > >> > > The change would be non-breaking as the embedded
>>>>> transaction
>>>>> > > > >> > > approach would remain as it is, but would no longer be
>>>>> > documented
>>>>> > > as
>>>>> > > > >> > > the preferred approach. Perhaps we could one day disallow
>>>>> it but
>>>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure
>>>>> we want
>>>>> > to
>>>>> > > > >> touch in TP3.
>>>>> > > > >> > >
>>>>> > > > >> > > What happens behind the scenes of g.tx().create() is that
>>>>> in
>>>>> > > remote
>>>>> > > > >> > > cases the RemoteConnection constructs a remote Transaction
>>>>> > > > >> > > implementation which basically is used to spawn new
>>>>> traversal
>>>>> > > source
>>>>> > > > >> > > instances with a session based connections. The remote
>>>>> > Transaction
>>>>> > > > >> > > object can't support
>>>>> > > > >> > transaction
>>>>> > > > >> > > listeners or special read-write/close configurations, but
>>>>> I
>>>>> > don't
>>>>> > > > >> > > think that's a problem as those things don't make a lot
>>>>> of sense
>>>>> > > in
>>>>> > > > >> > > remote use cases.
>>>>> > > > >> > >
>>>>> > > > >> > > From the server perspective, this change would also mean
>>>>> that
>>>>> > > > >> > > sessions would have to accept bytecode and not just
>>>>> scripts,
>>>>> > which
>>>>> > > > >> > > technically shouldn't be a problem.
>>>>> > > > >> > >
>>>>> > > > >> > > One downside at the moment is that i'm thinking mostly in
>>>>> Java
>>>>> > at
>>>>> > > > >> > > the moment. I'm not sure how this all works in other
>>>>> variants
>>>>> > just
>>>>> > > > >> > > yet, but
>>>>> > > > >> > I'd
>>>>> > > > >> > > hope to keep similar syntax.
>>>>> > > > >> > >
>>>>> > > > >> > > I will keep experimenting tomorrow. If you have any
>>>>> thoughts on
>>>>> > > the
>>>>> > > > >> > matter,
>>>>> > > > >> > > I'd be happy to hear them. Thanks!
>>>>> > > > >> > >
>>>>> > > > >> >
>>>>> > > > >>
>>>>> > > > >>
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

I can't seem to get Travis to pass the build for this branch with the
UnifiedChannelizer enabled.

https://travis-ci.com/github/apache/tinkerpop/builds/221427334

It just hangs without much information. I can't seem to get it to behave
otherwise and it doesn't fail for me locally (of course). If anyone has
some free CPU cycles could you please do:

mvn clean install -DskipTests && mvn verify -pl :gremlin-server -DskipTests
-DskipIntegrationTests=false -DincludeNeo4j -DtestUnified=true

i'm curious if anyone else can recreate the issue.....Thanks



On Thu, Mar 25, 2021 at 3:33 PM Stephen Mallette <sp...@gmail.com>
wrote:

> It took me a long time to get to the final steps of this change. Uncovered
> some unexpected bugs in the WsAndHttpChannelizer and that was gnarly to
> sort out. Also found some problems with authorization through the
> SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these
> combined channelizers really work. Seems like the new tangle of stuff in
> Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at
> this point. I might make some attempts on this body of work since it's
> fresh in my mind but we'll see.
>
> After getting through these little nits I've managed to get all of the
> Gremlin Server integration test cases passing with the new
> UnifiedChannelizer. I've added a -DtestUnified property to run the
> UnifiedChannelizer rather than the old stuff. This way we can add a new job
> to travis to cover that by itself in parallel to the old. You can run it
> like this:
>
> mvn verify -pl gremlin-server -DskipIntegrationTests=false
> -DtestUnified=true
>
> Having all the tests pass on the new channelizer makes me feel pretty
> confident that what I have at least works (and without much change in
> behavior) and therefore have pushed a branch for early review:
>
> https://github.com/apache/tinkerpop/tree/TINKERPOP-2245
>
> I still need to write a few more tests and after that it would be time to
> do a bit of performance testing to see how it compares to the old method.
> I'm also missing "metrics". If all that is good then I'll need to do a gang
> of documentation work. I imagine I will continue to keep this work separate
> from the bytecode transaction work that spawned it and merge the
> transaction stuff first followed by this. In that way, expect this branch
> to be a bit volatile with rebasing.
>
>
>
>
>
> On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> More reasonable progress with the UnifiedChannelizer. The short version
>> is that I have a pretty major quantity of the server integration tests
>> passing under this new Channelizer. The ones that are failing are typically
>> ones that assert some specific log messages or exception message which may
>> not necessarily apply anymore. I'm still mulling those over.
>>
>> The basic structure of the UnifiedChannelizer is that it sorta copies the
>> WsAndHttpChannelizer and builds from there, so when this UnfiedChannelizer
>> is configured you'll get HTTP and Websockets. The UnifiedChannelizer gets
>> rid of the entire OpProcessor infrastructure. I think that was a good body
>> of code years ago, but we've now learned quite explicitly what we want
>> Gremlin Server to do and what role it plays. OpProcessor just feels like an
>> unnecessary abstraction now. Removing it simplifies the code of the
>> UnifiedChannelizer, which adds a single UnifiedHandler to the netty
>> pipeline. That UnifiedHandler streamlines all those OpProcessor
>> implementations into one so all the duplicate code is effectively removed
>> by running RequestMessage instances through a single common flow. That
>> means all the request validation, error handling, iterator processing,
>> result building, transaction management, etc. will be in one place.
>>
>> The UnifiedHandler also kills out the GremlinExecutor. That means all the
>> Lifecycle callbacks and random lambdas that generated so much indirection
>> in the name of extensibility are no longer confusing the code base and the
>> wicked hierarchy of timesouts and exception throws are all flattened.
>>
>> When the UnifiedHandler gets a RequestMessage it validates it, builds a
>> Context and creates a Rexster instance (thought i'd bring back the name
>> rather than go with the generic "Worker" interface) that has the Context
>> added to it as a task. The Rexster instance is then submitted to the
>> Gremlin thread pool to do work when a thread is available. Rexster will
>> hang on to that thread awaiting tasks that are assigned to it. For
>> sessionless requests, that means Rexster handles one request and exits. For
>> a sessionful request it means the Rexster will wait for more tasks (Context
>> objects from new requests) to be added to it. The UnifiedHandler holds
>> those Rexster instances in a Map aligning session identifiers to Rexster
>> instances. As Divij alluded to, I suppose this means we are putting down
>> foundation for query cancellation and better insight into the queue of
>> things that are running.
>>
>> Now that the gremlinPool server setting covers both types of requests we
>> no longer have a situation where the number of threads that can be running
>> is unbounded and a "session" will now look a bit like a "long run job" in
>> that it will consume a thread from the pool for as long as the session is
>> open. So if the gremlinPool is 16, you could have 16 sessions active.
>> Sessionless requests would begin to queue as would requests for new
>> sessions and they would be serviced in the order received.
>>
>> I'm still not quite ready for a pull request. Despite having the bulk of
>> the integration tests working, the code isn't quite the way I think it will
>> be finally organized (I used a lot of inner classes and things which
>> probably should change). I also need to figure out how to run the tests. I
>> think I need to run them all per channelizer (old and new) but I think that
>> will push the build time over maximum on Travis, so I may need to break
>> things up somehow. I'm reasonably sure I'll have a branch ready to push
>> before the end of the week.
>>
>>
>>
>>
>>
>> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>>> I've started to play with some ideas for pushing all requests
>>> scripts/bytecode sessions/sessionless down one unified pipeline with one
>>> shared thread pool. The most interesting thing I've seen so far is how much
>>> the code could potentially simplify under this approach. The
>>> GremlinExecutor was the wrong abstraction in a sense and it forced way too
>>> many callbacks as arguments and made error handling kinda sloppy which made
>>> the server code harder to reuse, extend and read. Of course, all that was
>>> built before the first remote graph was built and we were just wildly
>>> guessing at how DS Graph was going to work with all this. Now, I think the
>>> remote model is much more clear, so maybe all this can be made "right" now.
>>>
>>> I think that if I can see any success at all with this in the next few
>>> days, we could include a new UnfiiedChannelizer that would replace the
>>> existing ones. I think we'd keep the old one in place by default and keep
>>> my copy/paste work for SessionOpProcessor that added bytecode there, with
>>> the idea that the UnifiedChannelizer will become the future default as we
>>> get it tested further along the 3.5.x line.
>>>
>>> I imagine I'll have more details on this task tomorrow and will post
>>> back here when I do.
>>>
>>>
>>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <di...@gmail.com>
>>> wrote:
>>>
>>>> >
>>>> > I suppose the
>>>> > question is how would we ensure that each request for a session ends
>>>> up
>>>> > being executed on the same thread from the previous request if jobs
>>>> are
>>>> > randomly submitted to a worker pool?
>>>>
>>>>
>>>> I haven't thought through the details, but on top of my head, we would
>>>> have
>>>> to maintain some request<->thread mapping on the server. This mapping is
>>>> also a building block for a request cancellation feature in future
>>>> where a
>>>> client would be able to send a cancellation request to the server, the
>>>> server will map the request to a thread executing that request and then
>>>> set
>>>> an interrupt on that thread to signify cancellation.
>>>>
>>>> Divij Vaidya
>>>>
>>>>
>>>>
>>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <sp...@gmail.com>
>>>> wrote:
>>>>
>>>> > I started a fresh thread for this topic Divij brought up, with more
>>>> > context:
>>>> >
>>>> > > In a scenario where we have both
>>>> > > session-less and sesion-ful requests being made to the server, the
>>>> > resource
>>>> > > allocation will not be fair and may lead to starvation for some
>>>> requests.
>>>> > > This problem exists even today, hence not totally correlated to
>>>> what you
>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>> should
>>>> > fix
>>>> > > this problem first. A solution would entail converging the worker
>>>> pool
>>>> > for
>>>> > > both session vs session-less requests.
>>>> >
>>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
>>>> > question is how would we ensure that each request for a session ends
>>>> up
>>>> > being executed on the same thread from the previous request if jobs
>>>> are
>>>> > randomly submitted to a worker pool?
>>>> >
>>>> >
>>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <
>>>> divijvaidya13@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Great idea Stephen. I agree with standardizing the explicit
>>>> transaction
>>>> > > semantics in cases of remote server (implicit aka sessionless is
>>>> already
>>>> > > self explanatory) and unifying the syntax with embedded graph usage.
>>>> > >
>>>> > > Couple of notes:
>>>> > >
>>>> > > 1. I would favor the begin() instead of create() as it's closer to
>>>> > > universally prominent SQL transaction syntax. This would reduce the
>>>> > > learning curve for users of TP.
>>>> > >
>>>> > > 2. From an implementation standpoint, sessionless requests are
>>>> queued on
>>>> > > the server side and are serviced by the worker thread pool. But a
>>>> > > session-ful aka explicit transaction aka managed transaction starts
>>>> a new
>>>> > > single worked thread pool every time. In a scenario where we have
>>>> both
>>>> > > session-less and sesion-ful requests being made to the server, the
>>>> > resource
>>>> > > allocation will not be fair and may lead to starvation for some
>>>> requests.
>>>> > > This problem exists even today, hence not totally correlated to
>>>> what you
>>>> > > are proposing but I am afraid a wider adoption of explicit
>>>> > > transactions will bring this problem to the spotlight. Hence, we
>>>> should
>>>> > fix
>>>> > > this problem first. A solution would entail converging the worker
>>>> pool
>>>> > for
>>>> > > both session vs session-less requests.
>>>> > >
>>>> > > 3. You are proposing the idea of having a transaction scoped
>>>> traversal
>>>> > > object. I agree with it but we need more clarification in behavior
>>>> wrt
>>>> > the
>>>> > > following scenarios:
>>>> > >
>>>> > > Q. What happens when g.tx().commit() is called on a transaction
>>>> scoped
>>>> > > traversal object? Does the traversal get closed?
>>>> > > Q. Currently, the same traversal object could be used to execute
>>>> multiple
>>>> > > session-less requests simultaneously in a thread safe manner. Does
>>>> the
>>>> > same
>>>> > > behaviour apply to transaction scoped traversal? If not, then how
>>>> do I
>>>> > send
>>>> > > multiple requests in parallel from the client all scoped to the same
>>>> > > transaction on the server? Note that this is different from case of
>>>> multi
>>>> > > threaded transactions because on the server all requests are scoped
>>>> to
>>>> > > single transaction i.e. single thread but on the client they may be
>>>> > > submitted via multiple threads.
>>>> > >
>>>> > > Regards,
>>>> > > Divij Vaidya
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>>>> spmallette@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > The Transaction object is a bit important with remote cases
>>>> because it
>>>> > > > holds the reference to the session. With embedded use cases we
>>>> > generally
>>>> > > > adhere to ThreadLocal transactions so the Transaction
>>>> implementation in
>>>> > > > that case is more of a controller for the current thread but for
>>>> remote
>>>> > > > cases the Transaction implementation holds a bit of state that can
>>>> > cross
>>>> > > > over threads. In some ways, that makes remote cases feel like a
>>>> > "threaded
>>>> > > > transaction" so that may be familiar to users?? Here's some
>>>> example
>>>> > > > syntax I currently have working in a test case:
>>>> > > >
>>>> > > > g = traversal().withRemote(conn)
>>>> > > > gtx = g.tx().create()
>>>> > > > assert gtx.isOpen() == true
>>>> > > > gtx.addV('person').iterate()
>>>> > > > gtx.addV('software').iterate()
>>>> > > > gtx.commit() // alternatively you could explicitly rollback()
>>>> > > > assert gtx.isOpen() == false
>>>> > > >
>>>> > > > I hope that documentation changes are enough to unify the syntax
>>>> and
>>>> > > remove
>>>> > > > confusion despite there still being ThreadLocal transactions as a
>>>> > default
>>>> > > > for embedded cases and something else for remote. At least they
>>>> will
>>>> > look
>>>> > > > the same, even if you technically don't need to do a
>>>> g.tx().create()
>>>> > for
>>>> > > > embedded transactions and can just have the control you always
>>>> had with
>>>> > > > g.tx().commit() directly.
>>>> > > >
>>>> > > > Note that the remote Transaction object can support configuration
>>>> via
>>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
>>>> default
>>>> > > of
>>>> > > > "commit on close" to rollback or manual (or i suppose something
>>>> > custom).
>>>> > > > It's nice that this piece works. I don't see a point in supporting
>>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
>>>> > remote
>>>> > > > contexts.
>>>> > > >
>>>> > > > Another point is that I've added what I termed a "Graph Operation"
>>>> > > bytecode
>>>> > > > instances which is bytecode that isn't related to traversal
>>>> > > construction. I
>>>> > > > hope this isn't one of those things where we end up wanting to
>>>> > deprecate
>>>> > > an
>>>> > > > idea as fast as we added it but we needed some way to pass
>>>> > > commit/rollback
>>>> > > > commands and they aren't really part of traversal construction.
>>>> They
>>>> > are
>>>> > > > operations that execute on the graph itself and I couldn't think
>>>> of how
>>>> > > to
>>>> > > > treat them as traversals nicely, so they sorta just exist as a
>>>> one off.
>>>> > > > Perhaps they will grow in number?? Folks have always asked if
>>>> bytecode
>>>> > > > requests could get "GraphFeature" information - I suppose this
>>>> could
>>>> > be a
>>>> > > > way to do that??
>>>> > > >
>>>> > > > Anyway, I will keep going down this general path as it's appearing
>>>> > > > relatively fruitful.
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>>>> spmallette@gmail.com
>>>> > >
>>>> > > > wrote:
>>>> > > >
>>>> > > > > > We should just communicate clearly that a simple remote
>>>> traversal
>>>> > > > > already uses a transaction by default,
>>>> > > > >
>>>> > > > > That's a good point because even with this change we still have
>>>> a
>>>> > > > > situation where a single iterated remote traversal will
>>>> generally
>>>> > mean
>>>> > > a
>>>> > > > > server managed commit/rollback, but in embedded mode, we have
>>>> an open
>>>> > > > > transaction (for transaction enabled graphs - TinkerGraph will
>>>> behave
>>>> > > > more
>>>> > > > > like a remote traversal, so more confusion there i guess). I'm
>>>> not
>>>> > sure
>>>> > > > how
>>>> > > > > to rectify that at this time except by way of documentation.
>>>> > > > >
>>>> > > > > The only thing I can think of is that the default for embedded
>>>> would
>>>> > > have
>>>> > > > > to be auto-commit per traversal. In that way it would work like
>>>> > remote
>>>> > > > > traversals and for graphs that don't support transactions like
>>>> > > > TinkerGraph.
>>>> > > > > Of course, that's the kind of change that will break a lot of
>>>> code.
>>>> > > Maybe
>>>> > > > > we just keep that idea for another version.
>>>> > > > >
>>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de> wrote:
>>>> > > > >
>>>> > > > >> I like the idea of adding transactions for remote traversals
>>>> as they
>>>> > > > >> close a gap in functionality that we currently have for remote
>>>> > > > traversals.
>>>> > > > >> We should just communicate clearly that a simple remote
>>>> traversal
>>>> > > > already
>>>> > > > >> uses a transaction by default, meaning that the server will
>>>> roll
>>>> > back
>>>> > > > any
>>>> > > > >> changes if any exception occurs. Users often ask about how to
>>>> do
>>>> > > > >> transactions with a remote traversal because they don't know
>>>> about
>>>> > > this
>>>> > > > and
>>>> > > > >> I'm afraid that we might add even more confusion if we now add
>>>> > > > transactions
>>>> > > > >> for remote traversals. Hence why I think that we should
>>>> communicate
>>>> > > this
>>>> > > > >> clearly when we add remote transactions.
>>>> > > > >>
>>>> > > > >> Getting this to work in the GLVs should be possible but will
>>>> require
>>>> > > > some
>>>> > > > >> effort. I think we would have to introduce some
>>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>>>> traversal
>>>> > > on
>>>> > > > a
>>>> > > > >> terminal step but saves it and then submits all saved
>>>> traversals
>>>> > > > together
>>>> > > > >> on close().
>>>> > > > >> This also means that we should directly add an async version of
>>>> > > close(),
>>>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>>>> > > > >>
>>>> > > > >> I also like create() better than traversal() because it would
>>>> > confuse
>>>> > > me
>>>> > > > >> to first start a traversal with traversal() and then also
>>>> start a
>>>> > > > >> transaction with the same method.
>>>> > > > >>
>>>> > > > >> -----Ursprüngliche Nachricht-----
>>>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>>>> > > > >> An: dev@tinkerpop.apache.org
>>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>>>> > > > >>
>>>> > > > >> One thing I should have noted about tx().create(). The
>>>> create() very
>>>> > > > well
>>>> > > > >> could have been named traversal(), thus the previous example
>>>> reading
>>>> > > as:
>>>> > > > >>
>>>> > > > >> g = traversal().withEmbedded(graph)
>>>> > > > >> // or
>>>> > > > >> g = traversal().withRemote(conn)
>>>> > > > >> gtx = g.tx().traversal()
>>>> > > > >> gtx.addV('person').iterate()
>>>> > > > >> gtx.addV('software').iterate()
>>>> > > > >> gtx.close() // alternatively you could explicitly commit() or
>>>> > > rollback()
>>>> > > > >>
>>>> > > > >> You're basically spawning a new GraphTraversalSource from the
>>>> > > > Transaction
>>>> > > > >> object rather than from a Graph or AnonymousTraversal. I chose
>>>> > > create()
>>>> > > > >> because it felt like this looked weird:
>>>> > > > >>
>>>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>>>> > > > >>
>>>> > > > >> which would be a weird thing to do I guess, but just seeing
>>>> > > > "traversal()"
>>>> > > > >> over and over seemed odd looking and I wanted to differentiate
>>>> with
>>>> > > the
>>>> > > > >> Transaction object even though the methods are identical in
>>>> what
>>>> > they
>>>> > > > do. I
>>>> > > > >> suppose I also drew inspiration from:
>>>> > > > >>
>>>> > > > >> Transaction.createdThreadedTx()
>>>> > > > >>
>>>> > > > >> which I think we might consider deprecating. JanusGraph would
>>>> simply
>>>> > > > >> expose their own Transaction object in the future that has that
>>>> > method
>>>> > > > as I
>>>> > > > >> imagine folks still use that feature. As far as I know, no
>>>> other
>>>> > graph
>>>> > > > >> implements that functionality and my guess is that no one will
>>>> > likely
>>>> > > > do so
>>>> > > > >> in the future.
>>>> > > > >>
>>>> > > > >> anyway, if you like traversal() better than create() or the
>>>> other
>>>> > way
>>>> > > > >> around, please let me know
>>>> > > > >>
>>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>>>> > dave@bechberger.com
>>>> > > >
>>>> > > > >> wrote:
>>>> > > > >>
>>>> > > > >> > I am in favor of this change as I any idea that unifies the
>>>> > multiple
>>>> > > > >> > different ways things work in TP will only make it easier to
>>>> learn
>>>> > > and
>>>> > > > >> > adopt.
>>>> > > > >> >
>>>> > > > >> > I don't know that I have enough knowledge of the inner
>>>> workings of
>>>> > > > >> > transactions to know what/if this will cause problems.
>>>> > > > >> >
>>>> > > > >> > Dave
>>>> > > > >> >
>>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>>>> > > > >> > <sp...@gmail.com>
>>>> > > > >> > wrote:
>>>> > > > >> >
>>>> > > > >> > > We haven't touched "transactions" since TP3 was originally
>>>> > > designed.
>>>> > > > >> > > They have remained a feature for embedded use cases even
>>>> in the
>>>> > > face
>>>> > > > >> > > of the
>>>> > > > >> > rise
>>>> > > > >> > > of remote graph use cases and result in major
>>>> inconsistencies
>>>> > that
>>>> > > > >> > > really bother users.
>>>> > > > >> > >
>>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a
>>>> chance
>>>> > to
>>>> > > > >> > > do something radical about transactions. Basically, I'd
>>>> like
>>>> > > > >> > > transactions to work irrespective of remote or embedded
>>>> usage
>>>> > and
>>>> > > > >> > > for them to have the
>>>> > > > >> > same
>>>> > > > >> > > API when doing so. In mulling it over for the last day or
>>>> so I
>>>> > > had a
>>>> > > > >> > > realization that makes me believe that the following is
>>>> > possible:
>>>> > > > >> > >
>>>> > > > >> > > g = traversal().withEmbedded(graph)
>>>> > > > >> > > // or
>>>> > > > >> > > g = traversal().withRemote(conn)
>>>> > > > >> > > gtx = g.tx().create()
>>>> > > > >> > > gtx.addV('person').iterate()
>>>> > > > >> > > gtx.addV('software').iterate()
>>>> > > > >> > > gtx.close() // alternatively you could explicitly commit()
>>>> or
>>>> > > > >> > > rollback()
>>>> > > > >> > >
>>>> > > > >> > > // you could still use g for sessionless, but gtx is "done"
>>>> > after
>>>> > > //
>>>> > > > >> > > you close so you will need to create() a new gtx instance
>>>> for a
>>>> > //
>>>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>>>> > > > >> > >
>>>> > > > >> > > Note that the create() method on tx() is the new bit of
>>>> > necessary
>>>> > > > >> syntax.
>>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and you
>>>> could
>>>> > > > >> > > skip it for thread-local auto-transactions) but all the
>>>> > > > >> > > documentation can be shifted so that the API is identical
>>>> to
>>>> > > remote.
>>>> > > > >> > > The change would be non-breaking as the embedded
>>>> transaction
>>>> > > > >> > > approach would remain as it is, but would no longer be
>>>> > documented
>>>> > > as
>>>> > > > >> > > the preferred approach. Perhaps we could one day disallow
>>>> it but
>>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we
>>>> want
>>>> > to
>>>> > > > >> touch in TP3.
>>>> > > > >> > >
>>>> > > > >> > > What happens behind the scenes of g.tx().create() is that
>>>> in
>>>> > > remote
>>>> > > > >> > > cases the RemoteConnection constructs a remote Transaction
>>>> > > > >> > > implementation which basically is used to spawn new
>>>> traversal
>>>> > > source
>>>> > > > >> > > instances with a session based connections. The remote
>>>> > Transaction
>>>> > > > >> > > object can't support
>>>> > > > >> > transaction
>>>> > > > >> > > listeners or special read-write/close configurations, but I
>>>> > don't
>>>> > > > >> > > think that's a problem as those things don't make a lot of
>>>> sense
>>>> > > in
>>>> > > > >> > > remote use cases.
>>>> > > > >> > >
>>>> > > > >> > > From the server perspective, this change would also mean
>>>> that
>>>> > > > >> > > sessions would have to accept bytecode and not just
>>>> scripts,
>>>> > which
>>>> > > > >> > > technically shouldn't be a problem.
>>>> > > > >> > >
>>>> > > > >> > > One downside at the moment is that i'm thinking mostly in
>>>> Java
>>>> > at
>>>> > > > >> > > the moment. I'm not sure how this all works in other
>>>> variants
>>>> > just
>>>> > > > >> > > yet, but
>>>> > > > >> > I'd
>>>> > > > >> > > hope to keep similar syntax.
>>>> > > > >> > >
>>>> > > > >> > > I will keep experimenting tomorrow. If you have any
>>>> thoughts on
>>>> > > the
>>>> > > > >> > matter,
>>>> > > > >> > > I'd be happy to hear them. Thanks!
>>>> > > > >> > >
>>>> > > > >> >
>>>> > > > >>
>>>> > > > >>
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

It took me a long time to get to the final steps of this change. Uncovered
some unexpected bugs in the WsAndHttpChannelizer and that was gnarly to
sort out. Also found some problems with authorization through the
SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these
combined channelizers really work. Seems like the new tangle of stuff in
Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at
this point. I might make some attempts on this body of work since it's
fresh in my mind but we'll see.

After getting through these little nits I've managed to get all of the
Gremlin Server integration test cases passing with the new
UnifiedChannelizer. I've added a -DtestUnified property to run the
UnifiedChannelizer rather than the old stuff. This way we can add a new job
to travis to cover that by itself in parallel to the old. You can run it
like this:

mvn verify -pl gremlin-server -DskipIntegrationTests=false
-DtestUnified=true

Having all the tests pass on the new channelizer makes me feel pretty
confident that what I have at least works (and without much change in
behavior) and therefore have pushed a branch for early review:

https://github.com/apache/tinkerpop/tree/TINKERPOP-2245

I still need to write a few more tests and after that it would be time to
do a bit of performance testing to see how it compares to the old method.
I'm also missing "metrics". If all that is good then I'll need to do a gang
of documentation work. I imagine I will continue to keep this work separate
from the bytecode transaction work that spawned it and merge the
transaction stuff first followed by this. In that way, expect this branch
to be a bit volatile with rebasing.





On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <sp...@gmail.com>
wrote:

> More reasonable progress with the UnifiedChannelizer. The short version is
> that I have a pretty major quantity of the server integration tests passing
> under this new Channelizer. The ones that are failing are typically ones
> that assert some specific log messages or exception message which may not
> necessarily apply anymore. I'm still mulling those over.
>
> The basic structure of the UnifiedChannelizer is that it sorta copies the
> WsAndHttpChannelizer and builds from there, so when this UnfiedChannelizer
> is configured you'll get HTTP and Websockets. The UnifiedChannelizer gets
> rid of the entire OpProcessor infrastructure. I think that was a good body
> of code years ago, but we've now learned quite explicitly what we want
> Gremlin Server to do and what role it plays. OpProcessor just feels like an
> unnecessary abstraction now. Removing it simplifies the code of the
> UnifiedChannelizer, which adds a single UnifiedHandler to the netty
> pipeline. That UnifiedHandler streamlines all those OpProcessor
> implementations into one so all the duplicate code is effectively removed
> by running RequestMessage instances through a single common flow. That
> means all the request validation, error handling, iterator processing,
> result building, transaction management, etc. will be in one place.
>
> The UnifiedHandler also kills out the GremlinExecutor. That means all the
> Lifecycle callbacks and random lambdas that generated so much indirection
> in the name of extensibility are no longer confusing the code base and the
> wicked hierarchy of timesouts and exception throws are all flattened.
>
> When the UnifiedHandler gets a RequestMessage it validates it, builds a
> Context and creates a Rexster instance (thought i'd bring back the name
> rather than go with the generic "Worker" interface) that has the Context
> added to it as a task. The Rexster instance is then submitted to the
> Gremlin thread pool to do work when a thread is available. Rexster will
> hang on to that thread awaiting tasks that are assigned to it. For
> sessionless requests, that means Rexster handles one request and exits. For
> a sessionful request it means the Rexster will wait for more tasks (Context
> objects from new requests) to be added to it. The UnifiedHandler holds
> those Rexster instances in a Map aligning session identifiers to Rexster
> instances. As Divij alluded to, I suppose this means we are putting down
> foundation for query cancellation and better insight into the queue of
> things that are running.
>
> Now that the gremlinPool server setting covers both types of requests we
> no longer have a situation where the number of threads that can be running
> is unbounded and a "session" will now look a bit like a "long run job" in
> that it will consume a thread from the pool for as long as the session is
> open. So if the gremlinPool is 16, you could have 16 sessions active.
> Sessionless requests would begin to queue as would requests for new
> sessions and they would be serviced in the order received.
>
> I'm still not quite ready for a pull request. Despite having the bulk of
> the integration tests working, the code isn't quite the way I think it will
> be finally organized (I used a lot of inner classes and things which
> probably should change). I also need to figure out how to run the tests. I
> think I need to run them all per channelizer (old and new) but I think that
> will push the build time over maximum on Travis, so I may need to break
> things up somehow. I'm reasonably sure I'll have a branch ready to push
> before the end of the week.
>
>
>
>
>
> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> I've started to play with some ideas for pushing all requests
>> scripts/bytecode sessions/sessionless down one unified pipeline with one
>> shared thread pool. The most interesting thing I've seen so far is how much
>> the code could potentially simplify under this approach. The
>> GremlinExecutor was the wrong abstraction in a sense and it forced way too
>> many callbacks as arguments and made error handling kinda sloppy which made
>> the server code harder to reuse, extend and read. Of course, all that was
>> built before the first remote graph was built and we were just wildly
>> guessing at how DS Graph was going to work with all this. Now, I think the
>> remote model is much more clear, so maybe all this can be made "right" now.
>>
>> I think that if I can see any success at all with this in the next few
>> days, we could include a new UnfiiedChannelizer that would replace the
>> existing ones. I think we'd keep the old one in place by default and keep
>> my copy/paste work for SessionOpProcessor that added bytecode there, with
>> the idea that the UnifiedChannelizer will become the future default as we
>> get it tested further along the 3.5.x line.
>>
>> I imagine I'll have more details on this task tomorrow and will post back
>> here when I do.
>>
>>
>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <di...@gmail.com>
>> wrote:
>>
>>> >
>>> > I suppose the
>>> > question is how would we ensure that each request for a session ends up
>>> > being executed on the same thread from the previous request if jobs are
>>> > randomly submitted to a worker pool?
>>>
>>>
>>> I haven't thought through the details, but on top of my head, we would
>>> have
>>> to maintain some request<->thread mapping on the server. This mapping is
>>> also a building block for a request cancellation feature in future where
>>> a
>>> client would be able to send a cancellation request to the server, the
>>> server will map the request to a thread executing that request and then
>>> set
>>> an interrupt on that thread to signify cancellation.
>>>
>>> Divij Vaidya
>>>
>>>
>>>
>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <sp...@gmail.com>
>>> wrote:
>>>
>>> > I started a fresh thread for this topic Divij brought up, with more
>>> > context:
>>> >
>>> > > In a scenario where we have both
>>> > > session-less and sesion-ful requests being made to the server, the
>>> > resource
>>> > > allocation will not be fair and may lead to starvation for some
>>> requests.
>>> > > This problem exists even today, hence not totally correlated to what
>>> you
>>> > > are proposing but I am afraid a wider adoption of explicit
>>> > > transactions will bring this problem to the spotlight. Hence, we
>>> should
>>> > fix
>>> > > this problem first. A solution would entail converging the worker
>>> pool
>>> > for
>>> > > both session vs session-less requests.
>>> >
>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
>>> > question is how would we ensure that each request for a session ends up
>>> > being executed on the same thread from the previous request if jobs are
>>> > randomly submitted to a worker pool?
>>> >
>>> >
>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <divijvaidya13@gmail.com
>>> >
>>> > wrote:
>>> >
>>> > > Great idea Stephen. I agree with standardizing the explicit
>>> transaction
>>> > > semantics in cases of remote server (implicit aka sessionless is
>>> already
>>> > > self explanatory) and unifying the syntax with embedded graph usage.
>>> > >
>>> > > Couple of notes:
>>> > >
>>> > > 1. I would favor the begin() instead of create() as it's closer to
>>> > > universally prominent SQL transaction syntax. This would reduce the
>>> > > learning curve for users of TP.
>>> > >
>>> > > 2. From an implementation standpoint, sessionless requests are
>>> queued on
>>> > > the server side and are serviced by the worker thread pool. But a
>>> > > session-ful aka explicit transaction aka managed transaction starts
>>> a new
>>> > > single worked thread pool every time. In a scenario where we have
>>> both
>>> > > session-less and sesion-ful requests being made to the server, the
>>> > resource
>>> > > allocation will not be fair and may lead to starvation for some
>>> requests.
>>> > > This problem exists even today, hence not totally correlated to what
>>> you
>>> > > are proposing but I am afraid a wider adoption of explicit
>>> > > transactions will bring this problem to the spotlight. Hence, we
>>> should
>>> > fix
>>> > > this problem first. A solution would entail converging the worker
>>> pool
>>> > for
>>> > > both session vs session-less requests.
>>> > >
>>> > > 3. You are proposing the idea of having a transaction scoped
>>> traversal
>>> > > object. I agree with it but we need more clarification in behavior
>>> wrt
>>> > the
>>> > > following scenarios:
>>> > >
>>> > > Q. What happens when g.tx().commit() is called on a transaction
>>> scoped
>>> > > traversal object? Does the traversal get closed?
>>> > > Q. Currently, the same traversal object could be used to execute
>>> multiple
>>> > > session-less requests simultaneously in a thread safe manner. Does
>>> the
>>> > same
>>> > > behaviour apply to transaction scoped traversal? If not, then how do
>>> I
>>> > send
>>> > > multiple requests in parallel from the client all scoped to the same
>>> > > transaction on the server? Note that this is different from case of
>>> multi
>>> > > threaded transactions because on the server all requests are scoped
>>> to
>>> > > single transaction i.e. single thread but on the client they may be
>>> > > submitted via multiple threads.
>>> > >
>>> > > Regards,
>>> > > Divij Vaidya
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>>> spmallette@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > The Transaction object is a bit important with remote cases
>>> because it
>>> > > > holds the reference to the session. With embedded use cases we
>>> > generally
>>> > > > adhere to ThreadLocal transactions so the Transaction
>>> implementation in
>>> > > > that case is more of a controller for the current thread but for
>>> remote
>>> > > > cases the Transaction implementation holds a bit of state that can
>>> > cross
>>> > > > over threads. In some ways, that makes remote cases feel like a
>>> > "threaded
>>> > > > transaction" so that may be familiar to users?? Here's some example
>>> > > > syntax I currently have working in a test case:
>>> > > >
>>> > > > g = traversal().withRemote(conn)
>>> > > > gtx = g.tx().create()
>>> > > > assert gtx.isOpen() == true
>>> > > > gtx.addV('person').iterate()
>>> > > > gtx.addV('software').iterate()
>>> > > > gtx.commit() // alternatively you could explicitly rollback()
>>> > > > assert gtx.isOpen() == false
>>> > > >
>>> > > > I hope that documentation changes are enough to unify the syntax
>>> and
>>> > > remove
>>> > > > confusion despite there still being ThreadLocal transactions as a
>>> > default
>>> > > > for embedded cases and something else for remote. At least they
>>> will
>>> > look
>>> > > > the same, even if you technically don't need to do a
>>> g.tx().create()
>>> > for
>>> > > > embedded transactions and can just have the control you always had
>>> with
>>> > > > g.tx().commit() directly.
>>> > > >
>>> > > > Note that the remote Transaction object can support configuration
>>> via
>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
>>> default
>>> > > of
>>> > > > "commit on close" to rollback or manual (or i suppose something
>>> > custom).
>>> > > > It's nice that this piece works. I don't see a point in supporting
>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
>>> > remote
>>> > > > contexts.
>>> > > >
>>> > > > Another point is that I've added what I termed a "Graph Operation"
>>> > > bytecode
>>> > > > instances which is bytecode that isn't related to traversal
>>> > > construction. I
>>> > > > hope this isn't one of those things where we end up wanting to
>>> > deprecate
>>> > > an
>>> > > > idea as fast as we added it but we needed some way to pass
>>> > > commit/rollback
>>> > > > commands and they aren't really part of traversal construction.
>>> They
>>> > are
>>> > > > operations that execute on the graph itself and I couldn't think
>>> of how
>>> > > to
>>> > > > treat them as traversals nicely, so they sorta just exist as a one
>>> off.
>>> > > > Perhaps they will grow in number?? Folks have always asked if
>>> bytecode
>>> > > > requests could get "GraphFeature" information - I suppose this
>>> could
>>> > be a
>>> > > > way to do that??
>>> > > >
>>> > > > Anyway, I will keep going down this general path as it's appearing
>>> > > > relatively fruitful.
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>>> spmallette@gmail.com
>>> > >
>>> > > > wrote:
>>> > > >
>>> > > > > > We should just communicate clearly that a simple remote
>>> traversal
>>> > > > > already uses a transaction by default,
>>> > > > >
>>> > > > > That's a good point because even with this change we still have a
>>> > > > > situation where a single iterated remote traversal will generally
>>> > mean
>>> > > a
>>> > > > > server managed commit/rollback, but in embedded mode, we have an
>>> open
>>> > > > > transaction (for transaction enabled graphs - TinkerGraph will
>>> behave
>>> > > > more
>>> > > > > like a remote traversal, so more confusion there i guess). I'm
>>> not
>>> > sure
>>> > > > how
>>> > > > > to rectify that at this time except by way of documentation.
>>> > > > >
>>> > > > > The only thing I can think of is that the default for embedded
>>> would
>>> > > have
>>> > > > > to be auto-commit per traversal. In that way it would work like
>>> > remote
>>> > > > > traversals and for graphs that don't support transactions like
>>> > > > TinkerGraph.
>>> > > > > Of course, that's the kind of change that will break a lot of
>>> code.
>>> > > Maybe
>>> > > > > we just keep that idea for another version.
>>> > > > >
>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de> wrote:
>>> > > > >
>>> > > > >> I like the idea of adding transactions for remote traversals as
>>> they
>>> > > > >> close a gap in functionality that we currently have for remote
>>> > > > traversals.
>>> > > > >> We should just communicate clearly that a simple remote
>>> traversal
>>> > > > already
>>> > > > >> uses a transaction by default, meaning that the server will roll
>>> > back
>>> > > > any
>>> > > > >> changes if any exception occurs. Users often ask about how to do
>>> > > > >> transactions with a remote traversal because they don't know
>>> about
>>> > > this
>>> > > > and
>>> > > > >> I'm afraid that we might add even more confusion if we now add
>>> > > > transactions
>>> > > > >> for remote traversals. Hence why I think that we should
>>> communicate
>>> > > this
>>> > > > >> clearly when we add remote transactions.
>>> > > > >>
>>> > > > >> Getting this to work in the GLVs should be possible but will
>>> require
>>> > > > some
>>> > > > >> effort. I think we would have to introduce some
>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>>> traversal
>>> > > on
>>> > > > a
>>> > > > >> terminal step but saves it and then submits all saved traversals
>>> > > > together
>>> > > > >> on close().
>>> > > > >> This also means that we should directly add an async version of
>>> > > close(),
>>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>>> > > > >>
>>> > > > >> I also like create() better than traversal() because it would
>>> > confuse
>>> > > me
>>> > > > >> to first start a traversal with traversal() and then also start
>>> a
>>> > > > >> transaction with the same method.
>>> > > > >>
>>> > > > >> -----Ursprüngliche Nachricht-----
>>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>>> > > > >> An: dev@tinkerpop.apache.org
>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>>> > > > >>
>>> > > > >> One thing I should have noted about tx().create(). The create()
>>> very
>>> > > > well
>>> > > > >> could have been named traversal(), thus the previous example
>>> reading
>>> > > as:
>>> > > > >>
>>> > > > >> g = traversal().withEmbedded(graph)
>>> > > > >> // or
>>> > > > >> g = traversal().withRemote(conn)
>>> > > > >> gtx = g.tx().traversal()
>>> > > > >> gtx.addV('person').iterate()
>>> > > > >> gtx.addV('software').iterate()
>>> > > > >> gtx.close() // alternatively you could explicitly commit() or
>>> > > rollback()
>>> > > > >>
>>> > > > >> You're basically spawning a new GraphTraversalSource from the
>>> > > > Transaction
>>> > > > >> object rather than from a Graph or AnonymousTraversal. I chose
>>> > > create()
>>> > > > >> because it felt like this looked weird:
>>> > > > >>
>>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>>> > > > >>
>>> > > > >> which would be a weird thing to do I guess, but just seeing
>>> > > > "traversal()"
>>> > > > >> over and over seemed odd looking and I wanted to differentiate
>>> with
>>> > > the
>>> > > > >> Transaction object even though the methods are identical in what
>>> > they
>>> > > > do. I
>>> > > > >> suppose I also drew inspiration from:
>>> > > > >>
>>> > > > >> Transaction.createdThreadedTx()
>>> > > > >>
>>> > > > >> which I think we might consider deprecating. JanusGraph would
>>> simply
>>> > > > >> expose their own Transaction object in the future that has that
>>> > method
>>> > > > as I
>>> > > > >> imagine folks still use that feature. As far as I know, no other
>>> > graph
>>> > > > >> implements that functionality and my guess is that no one will
>>> > likely
>>> > > > do so
>>> > > > >> in the future.
>>> > > > >>
>>> > > > >> anyway, if you like traversal() better than create() or the
>>> other
>>> > way
>>> > > > >> around, please let me know
>>> > > > >>
>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>>> > dave@bechberger.com
>>> > > >
>>> > > > >> wrote:
>>> > > > >>
>>> > > > >> > I am in favor of this change as I any idea that unifies the
>>> > multiple
>>> > > > >> > different ways things work in TP will only make it easier to
>>> learn
>>> > > and
>>> > > > >> > adopt.
>>> > > > >> >
>>> > > > >> > I don't know that I have enough knowledge of the inner
>>> workings of
>>> > > > >> > transactions to know what/if this will cause problems.
>>> > > > >> >
>>> > > > >> > Dave
>>> > > > >> >
>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>>> > > > >> > <sp...@gmail.com>
>>> > > > >> > wrote:
>>> > > > >> >
>>> > > > >> > > We haven't touched "transactions" since TP3 was originally
>>> > > designed.
>>> > > > >> > > They have remained a feature for embedded use cases even in
>>> the
>>> > > face
>>> > > > >> > > of the
>>> > > > >> > rise
>>> > > > >> > > of remote graph use cases and result in major
>>> inconsistencies
>>> > that
>>> > > > >> > > really bother users.
>>> > > > >> > >
>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a
>>> chance
>>> > to
>>> > > > >> > > do something radical about transactions. Basically, I'd like
>>> > > > >> > > transactions to work irrespective of remote or embedded
>>> usage
>>> > and
>>> > > > >> > > for them to have the
>>> > > > >> > same
>>> > > > >> > > API when doing so. In mulling it over for the last day or
>>> so I
>>> > > had a
>>> > > > >> > > realization that makes me believe that the following is
>>> > possible:
>>> > > > >> > >
>>> > > > >> > > g = traversal().withEmbedded(graph)
>>> > > > >> > > // or
>>> > > > >> > > g = traversal().withRemote(conn)
>>> > > > >> > > gtx = g.tx().create()
>>> > > > >> > > gtx.addV('person').iterate()
>>> > > > >> > > gtx.addV('software').iterate()
>>> > > > >> > > gtx.close() // alternatively you could explicitly commit()
>>> or
>>> > > > >> > > rollback()
>>> > > > >> > >
>>> > > > >> > > // you could still use g for sessionless, but gtx is "done"
>>> > after
>>> > > //
>>> > > > >> > > you close so you will need to create() a new gtx instance
>>> for a
>>> > //
>>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>>> > > > >> > >
>>> > > > >> > > Note that the create() method on tx() is the new bit of
>>> > necessary
>>> > > > >> syntax.
>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and you
>>> could
>>> > > > >> > > skip it for thread-local auto-transactions) but all the
>>> > > > >> > > documentation can be shifted so that the API is identical to
>>> > > remote.
>>> > > > >> > > The change would be non-breaking as the embedded transaction
>>> > > > >> > > approach would remain as it is, but would no longer be
>>> > documented
>>> > > as
>>> > > > >> > > the preferred approach. Perhaps we could one day disallow
>>> it but
>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we
>>> want
>>> > to
>>> > > > >> touch in TP3.
>>> > > > >> > >
>>> > > > >> > > What happens behind the scenes of g.tx().create() is that in
>>> > > remote
>>> > > > >> > > cases the RemoteConnection constructs a remote Transaction
>>> > > > >> > > implementation which basically is used to spawn new
>>> traversal
>>> > > source
>>> > > > >> > > instances with a session based connections. The remote
>>> > Transaction
>>> > > > >> > > object can't support
>>> > > > >> > transaction
>>> > > > >> > > listeners or special read-write/close configurations, but I
>>> > don't
>>> > > > >> > > think that's a problem as those things don't make a lot of
>>> sense
>>> > > in
>>> > > > >> > > remote use cases.
>>> > > > >> > >
>>> > > > >> > > From the server perspective, this change would also mean
>>> that
>>> > > > >> > > sessions would have to accept bytecode and not just scripts,
>>> > which
>>> > > > >> > > technically shouldn't be a problem.
>>> > > > >> > >
>>> > > > >> > > One downside at the moment is that i'm thinking mostly in
>>> Java
>>> > at
>>> > > > >> > > the moment. I'm not sure how this all works in other
>>> variants
>>> > just
>>> > > > >> > > yet, but
>>> > > > >> > I'd
>>> > > > >> > > hope to keep similar syntax.
>>> > > > >> > >
>>> > > > >> > > I will keep experimenting tomorrow. If you have any
>>> thoughts on
>>> > > the
>>> > > > >> > matter,
>>> > > > >> > > I'd be happy to hear them. Thanks!
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

More reasonable progress with the UnifiedChannelizer. The short version is
that I have a pretty major quantity of the server integration tests passing
under this new Channelizer. The ones that are failing are typically ones
that assert some specific log messages or exception message which may not
necessarily apply anymore. I'm still mulling those over.

The basic structure of the UnifiedChannelizer is that it sorta copies the
WsAndHttpChannelizer and builds from there, so when this UnfiedChannelizer
is configured you'll get HTTP and Websockets. The UnifiedChannelizer gets
rid of the entire OpProcessor infrastructure. I think that was a good body
of code years ago, but we've now learned quite explicitly what we want
Gremlin Server to do and what role it plays. OpProcessor just feels like an
unnecessary abstraction now. Removing it simplifies the code of the
UnifiedChannelizer, which adds a single UnifiedHandler to the netty
pipeline. That UnifiedHandler streamlines all those OpProcessor
implementations into one so all the duplicate code is effectively removed
by running RequestMessage instances through a single common flow. That
means all the request validation, error handling, iterator processing,
result building, transaction management, etc. will be in one place.

The UnifiedHandler also kills out the GremlinExecutor. That means all the
Lifecycle callbacks and random lambdas that generated so much indirection
in the name of extensibility are no longer confusing the code base and the
wicked hierarchy of timesouts and exception throws are all flattened.

When the UnifiedHandler gets a RequestMessage it validates it, builds a
Context and creates a Rexster instance (thought i'd bring back the name
rather than go with the generic "Worker" interface) that has the Context
added to it as a task. The Rexster instance is then submitted to the
Gremlin thread pool to do work when a thread is available. Rexster will
hang on to that thread awaiting tasks that are assigned to it. For
sessionless requests, that means Rexster handles one request and exits. For
a sessionful request it means the Rexster will wait for more tasks (Context
objects from new requests) to be added to it. The UnifiedHandler holds
those Rexster instances in a Map aligning session identifiers to Rexster
instances. As Divij alluded to, I suppose this means we are putting down
foundation for query cancellation and better insight into the queue of
things that are running.

Now that the gremlinPool server setting covers both types of requests we no
longer have a situation where the number of threads that can be running is
unbounded and a "session" will now look a bit like a "long run job" in that
it will consume a thread from the pool for as long as the session is open.
So if the gremlinPool is 16, you could have 16 sessions active. Sessionless
requests would begin to queue as would requests for new sessions and they
would be serviced in the order received.

I'm still not quite ready for a pull request. Despite having the bulk of
the integration tests working, the code isn't quite the way I think it will
be finally organized (I used a lot of inner classes and things which
probably should change). I also need to figure out how to run the tests. I
think I need to run them all per channelizer (old and new) but I think that
will push the build time over maximum on Travis, so I may need to break
things up somehow. I'm reasonably sure I'll have a branch ready to push
before the end of the week.





On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <sp...@gmail.com>
wrote:

> I've started to play with some ideas for pushing all requests
> scripts/bytecode sessions/sessionless down one unified pipeline with one
> shared thread pool. The most interesting thing I've seen so far is how much
> the code could potentially simplify under this approach. The
> GremlinExecutor was the wrong abstraction in a sense and it forced way too
> many callbacks as arguments and made error handling kinda sloppy which made
> the server code harder to reuse, extend and read. Of course, all that was
> built before the first remote graph was built and we were just wildly
> guessing at how DS Graph was going to work with all this. Now, I think the
> remote model is much more clear, so maybe all this can be made "right" now.
>
> I think that if I can see any success at all with this in the next few
> days, we could include a new UnfiiedChannelizer that would replace the
> existing ones. I think we'd keep the old one in place by default and keep
> my copy/paste work for SessionOpProcessor that added bytecode there, with
> the idea that the UnifiedChannelizer will become the future default as we
> get it tested further along the 3.5.x line.
>
> I imagine I'll have more details on this task tomorrow and will post back
> here when I do.
>
>
> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <di...@gmail.com>
> wrote:
>
>> >
>> > I suppose the
>> > question is how would we ensure that each request for a session ends up
>> > being executed on the same thread from the previous request if jobs are
>> > randomly submitted to a worker pool?
>>
>>
>> I haven't thought through the details, but on top of my head, we would
>> have
>> to maintain some request<->thread mapping on the server. This mapping is
>> also a building block for a request cancellation feature in future where a
>> client would be able to send a cancellation request to the server, the
>> server will map the request to a thread executing that request and then
>> set
>> an interrupt on that thread to signify cancellation.
>>
>> Divij Vaidya
>>
>>
>>
>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>> > I started a fresh thread for this topic Divij brought up, with more
>> > context:
>> >
>> > > In a scenario where we have both
>> > > session-less and sesion-ful requests being made to the server, the
>> > resource
>> > > allocation will not be fair and may lead to starvation for some
>> requests.
>> > > This problem exists even today, hence not totally correlated to what
>> you
>> > > are proposing but I am afraid a wider adoption of explicit
>> > > transactions will bring this problem to the spotlight. Hence, we
>> should
>> > fix
>> > > this problem first. A solution would entail converging the worker pool
>> > for
>> > > both session vs session-less requests.
>> >
>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
>> > question is how would we ensure that each request for a session ends up
>> > being executed on the same thread from the previous request if jobs are
>> > randomly submitted to a worker pool?
>> >
>> >
>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <di...@gmail.com>
>> > wrote:
>> >
>> > > Great idea Stephen. I agree with standardizing the explicit
>> transaction
>> > > semantics in cases of remote server (implicit aka sessionless is
>> already
>> > > self explanatory) and unifying the syntax with embedded graph usage.
>> > >
>> > > Couple of notes:
>> > >
>> > > 1. I would favor the begin() instead of create() as it's closer to
>> > > universally prominent SQL transaction syntax. This would reduce the
>> > > learning curve for users of TP.
>> > >
>> > > 2. From an implementation standpoint, sessionless requests are queued
>> on
>> > > the server side and are serviced by the worker thread pool. But a
>> > > session-ful aka explicit transaction aka managed transaction starts a
>> new
>> > > single worked thread pool every time. In a scenario where we have both
>> > > session-less and sesion-ful requests being made to the server, the
>> > resource
>> > > allocation will not be fair and may lead to starvation for some
>> requests.
>> > > This problem exists even today, hence not totally correlated to what
>> you
>> > > are proposing but I am afraid a wider adoption of explicit
>> > > transactions will bring this problem to the spotlight. Hence, we
>> should
>> > fix
>> > > this problem first. A solution would entail converging the worker pool
>> > for
>> > > both session vs session-less requests.
>> > >
>> > > 3. You are proposing the idea of having a transaction scoped traversal
>> > > object. I agree with it but we need more clarification in behavior wrt
>> > the
>> > > following scenarios:
>> > >
>> > > Q. What happens when g.tx().commit() is called on a transaction scoped
>> > > traversal object? Does the traversal get closed?
>> > > Q. Currently, the same traversal object could be used to execute
>> multiple
>> > > session-less requests simultaneously in a thread safe manner. Does the
>> > same
>> > > behaviour apply to transaction scoped traversal? If not, then how do I
>> > send
>> > > multiple requests in parallel from the client all scoped to the same
>> > > transaction on the server? Note that this is different from case of
>> multi
>> > > threaded transactions because on the server all requests are scoped to
>> > > single transaction i.e. single thread but on the client they may be
>> > > submitted via multiple threads.
>> > >
>> > > Regards,
>> > > Divij Vaidya
>> > >
>> > >
>> > >
>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <
>> spmallette@gmail.com>
>> > > wrote:
>> > >
>> > > > The Transaction object is a bit important with remote cases because
>> it
>> > > > holds the reference to the session. With embedded use cases we
>> > generally
>> > > > adhere to ThreadLocal transactions so the Transaction
>> implementation in
>> > > > that case is more of a controller for the current thread but for
>> remote
>> > > > cases the Transaction implementation holds a bit of state that can
>> > cross
>> > > > over threads. In some ways, that makes remote cases feel like a
>> > "threaded
>> > > > transaction" so that may be familiar to users?? Here's some example
>> > > > syntax I currently have working in a test case:
>> > > >
>> > > > g = traversal().withRemote(conn)
>> > > > gtx = g.tx().create()
>> > > > assert gtx.isOpen() == true
>> > > > gtx.addV('person').iterate()
>> > > > gtx.addV('software').iterate()
>> > > > gtx.commit() // alternatively you could explicitly rollback()
>> > > > assert gtx.isOpen() == false
>> > > >
>> > > > I hope that documentation changes are enough to unify the syntax and
>> > > remove
>> > > > confusion despite there still being ThreadLocal transactions as a
>> > default
>> > > > for embedded cases and something else for remote. At least they will
>> > look
>> > > > the same, even if you technically don't need to do a g.tx().create()
>> > for
>> > > > embedded transactions and can just have the control you always had
>> with
>> > > > g.tx().commit() directly.
>> > > >
>> > > > Note that the remote Transaction object can support configuration
>> via
>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
>> default
>> > > of
>> > > > "commit on close" to rollback or manual (or i suppose something
>> > custom).
>> > > > It's nice that this piece works. I don't see a point in supporting
>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
>> > remote
>> > > > contexts.
>> > > >
>> > > > Another point is that I've added what I termed a "Graph Operation"
>> > > bytecode
>> > > > instances which is bytecode that isn't related to traversal
>> > > construction. I
>> > > > hope this isn't one of those things where we end up wanting to
>> > deprecate
>> > > an
>> > > > idea as fast as we added it but we needed some way to pass
>> > > commit/rollback
>> > > > commands and they aren't really part of traversal construction. They
>> > are
>> > > > operations that execute on the graph itself and I couldn't think of
>> how
>> > > to
>> > > > treat them as traversals nicely, so they sorta just exist as a one
>> off.
>> > > > Perhaps they will grow in number?? Folks have always asked if
>> bytecode
>> > > > requests could get "GraphFeature" information - I suppose this could
>> > be a
>> > > > way to do that??
>> > > >
>> > > > Anyway, I will keep going down this general path as it's appearing
>> > > > relatively fruitful.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>> spmallette@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > > We should just communicate clearly that a simple remote
>> traversal
>> > > > > already uses a transaction by default,
>> > > > >
>> > > > > That's a good point because even with this change we still have a
>> > > > > situation where a single iterated remote traversal will generally
>> > mean
>> > > a
>> > > > > server managed commit/rollback, but in embedded mode, we have an
>> open
>> > > > > transaction (for transaction enabled graphs - TinkerGraph will
>> behave
>> > > > more
>> > > > > like a remote traversal, so more confusion there i guess). I'm not
>> > sure
>> > > > how
>> > > > > to rectify that at this time except by way of documentation.
>> > > > >
>> > > > > The only thing I can think of is that the default for embedded
>> would
>> > > have
>> > > > > to be auto-commit per traversal. In that way it would work like
>> > remote
>> > > > > traversals and for graphs that don't support transactions like
>> > > > TinkerGraph.
>> > > > > Of course, that's the kind of change that will break a lot of
>> code.
>> > > Maybe
>> > > > > we just keep that idea for another version.
>> > > > >
>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de> wrote:
>> > > > >
>> > > > >> I like the idea of adding transactions for remote traversals as
>> they
>> > > > >> close a gap in functionality that we currently have for remote
>> > > > traversals.
>> > > > >> We should just communicate clearly that a simple remote traversal
>> > > > already
>> > > > >> uses a transaction by default, meaning that the server will roll
>> > back
>> > > > any
>> > > > >> changes if any exception occurs. Users often ask about how to do
>> > > > >> transactions with a remote traversal because they don't know
>> about
>> > > this
>> > > > and
>> > > > >> I'm afraid that we might add even more confusion if we now add
>> > > > transactions
>> > > > >> for remote traversals. Hence why I think that we should
>> communicate
>> > > this
>> > > > >> clearly when we add remote transactions.
>> > > > >>
>> > > > >> Getting this to work in the GLVs should be possible but will
>> require
>> > > > some
>> > > > >> effort. I think we would have to introduce some
>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>> traversal
>> > > on
>> > > > a
>> > > > >> terminal step but saves it and then submits all saved traversals
>> > > > together
>> > > > >> on close().
>> > > > >> This also means that we should directly add an async version of
>> > > close(),
>> > > > >> maybe closeAsync()? (Same for commit() and rollback())
>> > > > >>
>> > > > >> I also like create() better than traversal() because it would
>> > confuse
>> > > me
>> > > > >> to first start a traversal with traversal() and then also start a
>> > > > >> transaction with the same method.
>> > > > >>
>> > > > >> -----Ursprüngliche Nachricht-----
>> > > > >> Von: Stephen Mallette <sp...@gmail.com>
>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
>> > > > >> An: dev@tinkerpop.apache.org
>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
>> > > > >>
>> > > > >> One thing I should have noted about tx().create(). The create()
>> very
>> > > > well
>> > > > >> could have been named traversal(), thus the previous example
>> reading
>> > > as:
>> > > > >>
>> > > > >> g = traversal().withEmbedded(graph)
>> > > > >> // or
>> > > > >> g = traversal().withRemote(conn)
>> > > > >> gtx = g.tx().traversal()
>> > > > >> gtx.addV('person').iterate()
>> > > > >> gtx.addV('software').iterate()
>> > > > >> gtx.close() // alternatively you could explicitly commit() or
>> > > rollback()
>> > > > >>
>> > > > >> You're basically spawning a new GraphTraversalSource from the
>> > > > Transaction
>> > > > >> object rather than from a Graph or AnonymousTraversal. I chose
>> > > create()
>> > > > >> because it felt like this looked weird:
>> > > > >>
>> > > > >> g = traversal().withRemote(conn).tx().traversal()
>> > > > >>
>> > > > >> which would be a weird thing to do I guess, but just seeing
>> > > > "traversal()"
>> > > > >> over and over seemed odd looking and I wanted to differentiate
>> with
>> > > the
>> > > > >> Transaction object even though the methods are identical in what
>> > they
>> > > > do. I
>> > > > >> suppose I also drew inspiration from:
>> > > > >>
>> > > > >> Transaction.createdThreadedTx()
>> > > > >>
>> > > > >> which I think we might consider deprecating. JanusGraph would
>> simply
>> > > > >> expose their own Transaction object in the future that has that
>> > method
>> > > > as I
>> > > > >> imagine folks still use that feature. As far as I know, no other
>> > graph
>> > > > >> implements that functionality and my guess is that no one will
>> > likely
>> > > > do so
>> > > > >> in the future.
>> > > > >>
>> > > > >> anyway, if you like traversal() better than create() or the other
>> > way
>> > > > >> around, please let me know
>> > > > >>
>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>> > dave@bechberger.com
>> > > >
>> > > > >> wrote:
>> > > > >>
>> > > > >> > I am in favor of this change as I any idea that unifies the
>> > multiple
>> > > > >> > different ways things work in TP will only make it easier to
>> learn
>> > > and
>> > > > >> > adopt.
>> > > > >> >
>> > > > >> > I don't know that I have enough knowledge of the inner
>> workings of
>> > > > >> > transactions to know what/if this will cause problems.
>> > > > >> >
>> > > > >> > Dave
>> > > > >> >
>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>> > > > >> > <sp...@gmail.com>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > We haven't touched "transactions" since TP3 was originally
>> > > designed.
>> > > > >> > > They have remained a feature for embedded use cases even in
>> the
>> > > face
>> > > > >> > > of the
>> > > > >> > rise
>> > > > >> > > of remote graph use cases and result in major inconsistencies
>> > that
>> > > > >> > > really bother users.
>> > > > >> > >
>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a
>> chance
>> > to
>> > > > >> > > do something radical about transactions. Basically, I'd like
>> > > > >> > > transactions to work irrespective of remote or embedded usage
>> > and
>> > > > >> > > for them to have the
>> > > > >> > same
>> > > > >> > > API when doing so. In mulling it over for the last day or so
>> I
>> > > had a
>> > > > >> > > realization that makes me believe that the following is
>> > possible:
>> > > > >> > >
>> > > > >> > > g = traversal().withEmbedded(graph)
>> > > > >> > > // or
>> > > > >> > > g = traversal().withRemote(conn)
>> > > > >> > > gtx = g.tx().create()
>> > > > >> > > gtx.addV('person').iterate()
>> > > > >> > > gtx.addV('software').iterate()
>> > > > >> > > gtx.close() // alternatively you could explicitly commit() or
>> > > > >> > > rollback()
>> > > > >> > >
>> > > > >> > > // you could still use g for sessionless, but gtx is "done"
>> > after
>> > > //
>> > > > >> > > you close so you will need to create() a new gtx instance
>> for a
>> > //
>> > > > >> > > fresh transaction assert 2 == g.V().count().next()
>> > > > >> > >
>> > > > >> > > Note that the create() method on tx() is the new bit of
>> > necessary
>> > > > >> syntax.
>> > > > >> > > Technically, it is a do-nothing for embedded mode (and you
>> could
>> > > > >> > > skip it for thread-local auto-transactions) but all the
>> > > > >> > > documentation can be shifted so that the API is identical to
>> > > remote.
>> > > > >> > > The change would be non-breaking as the embedded transaction
>> > > > >> > > approach would remain as it is, but would no longer be
>> > documented
>> > > as
>> > > > >> > > the preferred approach. Perhaps we could one day disallow it
>> but
>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we
>> want
>> > to
>> > > > >> touch in TP3.
>> > > > >> > >
>> > > > >> > > What happens behind the scenes of g.tx().create() is that in
>> > > remote
>> > > > >> > > cases the RemoteConnection constructs a remote Transaction
>> > > > >> > > implementation which basically is used to spawn new traversal
>> > > source
>> > > > >> > > instances with a session based connections. The remote
>> > Transaction
>> > > > >> > > object can't support
>> > > > >> > transaction
>> > > > >> > > listeners or special read-write/close configurations, but I
>> > don't
>> > > > >> > > think that's a problem as those things don't make a lot of
>> sense
>> > > in
>> > > > >> > > remote use cases.
>> > > > >> > >
>> > > > >> > > From the server perspective, this change would also mean that
>> > > > >> > > sessions would have to accept bytecode and not just scripts,
>> > which
>> > > > >> > > technically shouldn't be a problem.
>> > > > >> > >
>> > > > >> > > One downside at the moment is that i'm thinking mostly in
>> Java
>> > at
>> > > > >> > > the moment. I'm not sure how this all works in other variants
>> > just
>> > > > >> > > yet, but
>> > > > >> > I'd
>> > > > >> > > hope to keep similar syntax.
>> > > > >> > >
>> > > > >> > > I will keep experimenting tomorrow. If you have any thoughts
>> on
>> > > the
>> > > > >> > matter,
>> > > > >> > > I'd be happy to hear them. Thanks!
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Stephen Mallette <sp...@gmail.com>.

I've started to play with some ideas for pushing all requests
scripts/bytecode sessions/sessionless down one unified pipeline with one
shared thread pool. The most interesting thing I've seen so far is how much
the code could potentially simplify under this approach. The
GremlinExecutor was the wrong abstraction in a sense and it forced way too
many callbacks as arguments and made error handling kinda sloppy which made
the server code harder to reuse, extend and read. Of course, all that was
built before the first remote graph was built and we were just wildly
guessing at how DS Graph was going to work with all this. Now, I think the
remote model is much more clear, so maybe all this can be made "right" now.

I think that if I can see any success at all with this in the next few
days, we could include a new UnfiiedChannelizer that would replace the
existing ones. I think we'd keep the old one in place by default and keep
my copy/paste work for SessionOpProcessor that added bytecode there, with
the idea that the UnifiedChannelizer will become the future default as we
get it tested further along the 3.5.x line.

I imagine I'll have more details on this task tomorrow and will post back
here when I do.


On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <di...@gmail.com>
wrote:

> >
> > I suppose the
> > question is how would we ensure that each request for a session ends up
> > being executed on the same thread from the previous request if jobs are
> > randomly submitted to a worker pool?
>
>
> I haven't thought through the details, but on top of my head, we would have
> to maintain some request<->thread mapping on the server. This mapping is
> also a building block for a request cancellation feature in future where a
> client would be able to send a cancellation request to the server, the
> server will map the request to a thread executing that request and then set
> an interrupt on that thread to signify cancellation.
>
> Divij Vaidya
>
>
>
> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > I started a fresh thread for this topic Divij brought up, with more
> > context:
> >
> > > In a scenario where we have both
> > > session-less and sesion-ful requests being made to the server, the
> > resource
> > > allocation will not be fair and may lead to starvation for some
> requests.
> > > This problem exists even today, hence not totally correlated to what
> you
> > > are proposing but I am afraid a wider adoption of explicit
> > > transactions will bring this problem to the spotlight. Hence, we should
> > fix
> > > this problem first. A solution would entail converging the worker pool
> > for
> > > both session vs session-less requests.
> >
> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
> > question is how would we ensure that each request for a session ends up
> > being executed on the same thread from the previous request if jobs are
> > randomly submitted to a worker pool?
> >
> >
> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <di...@gmail.com>
> > wrote:
> >
> > > Great idea Stephen. I agree with standardizing the explicit transaction
> > > semantics in cases of remote server (implicit aka sessionless is
> already
> > > self explanatory) and unifying the syntax with embedded graph usage.
> > >
> > > Couple of notes:
> > >
> > > 1. I would favor the begin() instead of create() as it's closer to
> > > universally prominent SQL transaction syntax. This would reduce the
> > > learning curve for users of TP.
> > >
> > > 2. From an implementation standpoint, sessionless requests are queued
> on
> > > the server side and are serviced by the worker thread pool. But a
> > > session-ful aka explicit transaction aka managed transaction starts a
> new
> > > single worked thread pool every time. In a scenario where we have both
> > > session-less and sesion-ful requests being made to the server, the
> > resource
> > > allocation will not be fair and may lead to starvation for some
> requests.
> > > This problem exists even today, hence not totally correlated to what
> you
> > > are proposing but I am afraid a wider adoption of explicit
> > > transactions will bring this problem to the spotlight. Hence, we should
> > fix
> > > this problem first. A solution would entail converging the worker pool
> > for
> > > both session vs session-less requests.
> > >
> > > 3. You are proposing the idea of having a transaction scoped traversal
> > > object. I agree with it but we need more clarification in behavior wrt
> > the
> > > following scenarios:
> > >
> > > Q. What happens when g.tx().commit() is called on a transaction scoped
> > > traversal object? Does the traversal get closed?
> > > Q. Currently, the same traversal object could be used to execute
> multiple
> > > session-less requests simultaneously in a thread safe manner. Does the
> > same
> > > behaviour apply to transaction scoped traversal? If not, then how do I
> > send
> > > multiple requests in parallel from the client all scoped to the same
> > > transaction on the server? Note that this is different from case of
> multi
> > > threaded transactions because on the server all requests are scoped to
> > > single transaction i.e. single thread but on the client they may be
> > > submitted via multiple threads.
> > >
> > > Regards,
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <spmallette@gmail.com
> >
> > > wrote:
> > >
> > > > The Transaction object is a bit important with remote cases because
> it
> > > > holds the reference to the session. With embedded use cases we
> > generally
> > > > adhere to ThreadLocal transactions so the Transaction implementation
> in
> > > > that case is more of a controller for the current thread but for
> remote
> > > > cases the Transaction implementation holds a bit of state that can
> > cross
> > > > over threads. In some ways, that makes remote cases feel like a
> > "threaded
> > > > transaction" so that may be familiar to users?? Here's some example
> > > > syntax I currently have working in a test case:
> > > >
> > > > g = traversal().withRemote(conn)
> > > > gtx = g.tx().create()
> > > > assert gtx.isOpen() == true
> > > > gtx.addV('person').iterate()
> > > > gtx.addV('software').iterate()
> > > > gtx.commit() // alternatively you could explicitly rollback()
> > > > assert gtx.isOpen() == false
> > > >
> > > > I hope that documentation changes are enough to unify the syntax and
> > > remove
> > > > confusion despite there still being ThreadLocal transactions as a
> > default
> > > > for embedded cases and something else for remote. At least they will
> > look
> > > > the same, even if you technically don't need to do a g.tx().create()
> > for
> > > > embedded transactions and can just have the control you always had
> with
> > > > g.tx().commit() directly.
> > > >
> > > > Note that the remote Transaction object can support configuration via
> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
> default
> > > of
> > > > "commit on close" to rollback or manual (or i suppose something
> > custom).
> > > > It's nice that this piece works. I don't see a point in supporting
> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
> > remote
> > > > contexts.
> > > >
> > > > Another point is that I've added what I termed a "Graph Operation"
> > > bytecode
> > > > instances which is bytecode that isn't related to traversal
> > > construction. I
> > > > hope this isn't one of those things where we end up wanting to
> > deprecate
> > > an
> > > > idea as fast as we added it but we needed some way to pass
> > > commit/rollback
> > > > commands and they aren't really part of traversal construction. They
> > are
> > > > operations that execute on the graph itself and I couldn't think of
> how
> > > to
> > > > treat them as traversals nicely, so they sorta just exist as a one
> off.
> > > > Perhaps they will grow in number?? Folks have always asked if
> bytecode
> > > > requests could get "GraphFeature" information - I suppose this could
> > be a
> > > > way to do that??
> > > >
> > > > Anyway, I will keep going down this general path as it's appearing
> > > > relatively fruitful.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
> spmallette@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > > We should just communicate clearly that a simple remote traversal
> > > > > already uses a transaction by default,
> > > > >
> > > > > That's a good point because even with this change we still have a
> > > > > situation where a single iterated remote traversal will generally
> > mean
> > > a
> > > > > server managed commit/rollback, but in embedded mode, we have an
> open
> > > > > transaction (for transaction enabled graphs - TinkerGraph will
> behave
> > > > more
> > > > > like a remote traversal, so more confusion there i guess). I'm not
> > sure
> > > > how
> > > > > to rectify that at this time except by way of documentation.
> > > > >
> > > > > The only thing I can think of is that the default for embedded
> would
> > > have
> > > > > to be auto-commit per traversal. In that way it would work like
> > remote
> > > > > traversals and for graphs that don't support transactions like
> > > > TinkerGraph.
> > > > > Of course, that's the kind of change that will break a lot of code.
> > > Maybe
> > > > > we just keep that idea for another version.
> > > > >
> > > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de> wrote:
> > > > >
> > > > >> I like the idea of adding transactions for remote traversals as
> they
> > > > >> close a gap in functionality that we currently have for remote
> > > > traversals.
> > > > >> We should just communicate clearly that a simple remote traversal
> > > > already
> > > > >> uses a transaction by default, meaning that the server will roll
> > back
> > > > any
> > > > >> changes if any exception occurs. Users often ask about how to do
> > > > >> transactions with a remote traversal because they don't know about
> > > this
> > > > and
> > > > >> I'm afraid that we might add even more confusion if we now add
> > > > transactions
> > > > >> for remote traversals. Hence why I think that we should
> communicate
> > > this
> > > > >> clearly when we add remote transactions.
> > > > >>
> > > > >> Getting this to work in the GLVs should be possible but will
> require
> > > > some
> > > > >> effort. I think we would have to introduce some
> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the
> traversal
> > > on
> > > > a
> > > > >> terminal step but saves it and then submits all saved traversals
> > > > together
> > > > >> on close().
> > > > >> This also means that we should directly add an async version of
> > > close(),
> > > > >> maybe closeAsync()? (Same for commit() and rollback())
> > > > >>
> > > > >> I also like create() better than traversal() because it would
> > confuse
> > > me
> > > > >> to first start a traversal with traversal() and then also start a
> > > > >> transaction with the same method.
> > > > >>
> > > > >> -----Ursprüngliche Nachricht-----
> > > > >> Von: Stephen Mallette <sp...@gmail.com>
> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01
> > > > >> An: dev@tinkerpop.apache.org
> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions
> > > > >>
> > > > >> One thing I should have noted about tx().create(). The create()
> very
> > > > well
> > > > >> could have been named traversal(), thus the previous example
> reading
> > > as:
> > > > >>
> > > > >> g = traversal().withEmbedded(graph)
> > > > >> // or
> > > > >> g = traversal().withRemote(conn)
> > > > >> gtx = g.tx().traversal()
> > > > >> gtx.addV('person').iterate()
> > > > >> gtx.addV('software').iterate()
> > > > >> gtx.close() // alternatively you could explicitly commit() or
> > > rollback()
> > > > >>
> > > > >> You're basically spawning a new GraphTraversalSource from the
> > > > Transaction
> > > > >> object rather than from a Graph or AnonymousTraversal. I chose
> > > create()
> > > > >> because it felt like this looked weird:
> > > > >>
> > > > >> g = traversal().withRemote(conn).tx().traversal()
> > > > >>
> > > > >> which would be a weird thing to do I guess, but just seeing
> > > > "traversal()"
> > > > >> over and over seemed odd looking and I wanted to differentiate
> with
> > > the
> > > > >> Transaction object even though the methods are identical in what
> > they
> > > > do. I
> > > > >> suppose I also drew inspiration from:
> > > > >>
> > > > >> Transaction.createdThreadedTx()
> > > > >>
> > > > >> which I think we might consider deprecating. JanusGraph would
> simply
> > > > >> expose their own Transaction object in the future that has that
> > method
> > > > as I
> > > > >> imagine folks still use that feature. As far as I know, no other
> > graph
> > > > >> implements that functionality and my guess is that no one will
> > likely
> > > > do so
> > > > >> in the future.
> > > > >>
> > > > >> anyway, if you like traversal() better than create() or the other
> > way
> > > > >> around, please let me know
> > > > >>
> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
> > dave@bechberger.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > I am in favor of this change as I any idea that unifies the
> > multiple
> > > > >> > different ways things work in TP will only make it easier to
> learn
> > > and
> > > > >> > adopt.
> > > > >> >
> > > > >> > I don't know that I have enough knowledge of the inner workings
> of
> > > > >> > transactions to know what/if this will cause problems.
> > > > >> >
> > > > >> > Dave
> > > > >> >
> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
> > > > >> > <sp...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > We haven't touched "transactions" since TP3 was originally
> > > designed.
> > > > >> > > They have remained a feature for embedded use cases even in
> the
> > > face
> > > > >> > > of the
> > > > >> > rise
> > > > >> > > of remote graph use cases and result in major inconsistencies
> > that
> > > > >> > > really bother users.
> > > > >> > >
> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a
> chance
> > to
> > > > >> > > do something radical about transactions. Basically, I'd like
> > > > >> > > transactions to work irrespective of remote or embedded usage
> > and
> > > > >> > > for them to have the
> > > > >> > same
> > > > >> > > API when doing so. In mulling it over for the last day or so I
> > > had a
> > > > >> > > realization that makes me believe that the following is
> > possible:
> > > > >> > >
> > > > >> > > g = traversal().withEmbedded(graph)
> > > > >> > > // or
> > > > >> > > g = traversal().withRemote(conn)
> > > > >> > > gtx = g.tx().create()
> > > > >> > > gtx.addV('person').iterate()
> > > > >> > > gtx.addV('software').iterate()
> > > > >> > > gtx.close() // alternatively you could explicitly commit() or
> > > > >> > > rollback()
> > > > >> > >
> > > > >> > > // you could still use g for sessionless, but gtx is "done"
> > after
> > > //
> > > > >> > > you close so you will need to create() a new gtx instance for
> a
> > //
> > > > >> > > fresh transaction assert 2 == g.V().count().next()
> > > > >> > >
> > > > >> > > Note that the create() method on tx() is the new bit of
> > necessary
> > > > >> syntax.
> > > > >> > > Technically, it is a do-nothing for embedded mode (and you
> could
> > > > >> > > skip it for thread-local auto-transactions) but all the
> > > > >> > > documentation can be shifted so that the API is identical to
> > > remote.
> > > > >> > > The change would be non-breaking as the embedded transaction
> > > > >> > > approach would remain as it is, but would no longer be
> > documented
> > > as
> > > > >> > > the preferred approach. Perhaps we could one day disallow it
> but
> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we
> want
> > to
> > > > >> touch in TP3.
> > > > >> > >
> > > > >> > > What happens behind the scenes of g.tx().create() is that in
> > > remote
> > > > >> > > cases the RemoteConnection constructs a remote Transaction
> > > > >> > > implementation which basically is used to spawn new traversal
> > > source
> > > > >> > > instances with a session based connections. The remote
> > Transaction
> > > > >> > > object can't support
> > > > >> > transaction
> > > > >> > > listeners or special read-write/close configurations, but I
> > don't
> > > > >> > > think that's a problem as those things don't make a lot of
> sense
> > > in
> > > > >> > > remote use cases.
> > > > >> > >
> > > > >> > > From the server perspective, this change would also mean that
> > > > >> > > sessions would have to accept bytecode and not just scripts,
> > which
> > > > >> > > technically shouldn't be a problem.
> > > > >> > >
> > > > >> > > One downside at the moment is that i'm thinking mostly in Java
> > at
> > > > >> > > the moment. I'm not sure how this all works in other variants
> > just
> > > > >> > > yet, but
> > > > >> > I'd
> > > > >> > > hope to keep similar syntax.
> > > > >> > >
> > > > >> > > I will keep experimenting tomorrow. If you have any thoughts
> on
> > > the
> > > > >> > matter,
> > > > >> > > I'd be happy to hear them. Thanks!
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Posted by Divij Vaidya <di...@gmail.com>.

>
> I suppose the
> question is how would we ensure that each request for a session ends up
> being executed on the same thread from the previous request if jobs are
> randomly submitted to a worker pool?


I haven't thought through the details, but on top of my head, we would have
to maintain some request<->thread mapping on the server. This mapping is
also a building block for a request cancellation feature in future where a
client would be able to send a cancellation request to the server, the
server will map the request to a thread executing that request and then set
an interrupt on that thread to signify cancellation.

Divij Vaidya



On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <sp...@gmail.com>
wrote:

> I started a fresh thread for this topic Divij brought up, with more
> context:
>
> > In a scenario where we have both
> > session-less and sesion-ful requests being made to the server, the
> resource
> > allocation will not be fair and may lead to starvation for some requests.
> > This problem exists even today, hence not totally correlated to what you
> > are proposing but I am afraid a wider adoption of explicit
> > transactions will bring this problem to the spotlight. Hence, we should
> fix
> > this problem first. A solution would entail converging the worker pool
> for
> > both session vs session-less requests.
>
> I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
> question is how would we ensure that each request for a session ends up
> being executed on the same thread from the previous request if jobs are
> randomly submitted to a worker pool?
>
>
> On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <di...@gmail.com>
> wrote:
>
> > Great idea Stephen. I agree with standardizing the explicit transaction
> > semantics in cases of remote server (implicit aka sessionless is already
> > self explanatory) and unifying the syntax with embedded graph usage.
> >
> > Couple of notes:
> >
> > 1. I would favor the begin() instead of create() as it's closer to
> > universally prominent SQL transaction syntax. This would reduce the
> > learning curve for users of TP.
> >
> > 2. From an implementation standpoint, sessionless requests are queued on
> > the server side and are serviced by the worker thread pool. But a
> > session-ful aka explicit transaction aka managed transaction starts a new
> > single worked thread pool every time. In a scenario where we have both
> > session-less and sesion-ful requests being made to the server, the
> resource
> > allocation will not be fair and may lead to starvation for some requests.
> > This problem exists even today, hence not totally correlated to what you
> > are proposing but I am afraid a wider adoption of explicit
> > transactions will bring this problem to the spotlight. Hence, we should
> fix
> > this problem first. A solution would entail converging the worker pool
> for
> > both session vs session-less requests.
> >
> > 3. You are proposing the idea of having a transaction scoped traversal
> > object. I agree with it but we need more clarification in behavior wrt
> the
> > following scenarios:
> >
> > Q. What happens when g.tx().commit() is called on a transaction scoped
> > traversal object? Does the traversal get closed?
> > Q. Currently, the same traversal object could be used to execute multiple
> > session-less requests simultaneously in a thread safe manner. Does the
> same
> > behaviour apply to transaction scoped traversal? If not, then how do I
> send
> > multiple requests in parallel from the client all scoped to the same
> > transaction on the server? Note that this is different from case of multi
> > threaded transactions because on the server all requests are scoped to
> > single transaction i.e. single thread but on the client they may be
> > submitted via multiple threads.
> >
> > Regards,
> > Divij Vaidya
> >
> >
> >
> > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > The Transaction object is a bit important with remote cases because it
> > > holds the reference to the session. With embedded use cases we
> generally
> > > adhere to ThreadLocal transactions so the Transaction implementation in
> > > that case is more of a controller for the current thread but for remote
> > > cases the Transaction implementation holds a bit of state that can
> cross
> > > over threads. In some ways, that makes remote cases feel like a
> "threaded
> > > transaction" so that may be familiar to users?? Here's some example
> > > syntax I currently have working in a test case:
> > >
> > > g = traversal().withRemote(conn)
> > > gtx = g.tx().create()
> > > assert gtx.isOpen() == true
> > > gtx.addV('person').iterate()
> > > gtx.addV('software').iterate()
> > > gtx.commit() // alternatively you could explicitly rollback()
> > > assert gtx.isOpen() == false
> > >
> > > I hope that documentation changes are enough to unify the syntax and
> > remove
> > > confusion despite there still being ThreadLocal transactions as a
> default
> > > for embedded cases and something else for remote. At least they will
> look
> > > the same, even if you technically don't need to do a g.tx().create()
> for
> > > embedded transactions and can just have the control you always had with
> > > g.tx().commit() directly.
> > >
> > > Note that the remote Transaction object can support configuration via
> > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the default
> > of
> > > "commit on close" to rollback or manual (or i suppose something
> custom).
> > > It's nice that this piece works. I don't see a point in supporting
> > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
> remote
> > > contexts.
> > >
> > > Another point is that I've added what I termed a "Graph Operation"
> > bytecode
> > > instances which is bytecode that isn't related to traversal
> > construction. I
> > > hope this isn't one of those things where we end up wanting to
> deprecate
> > an
> > > idea as fast as we added it but we needed some way to pass
> > commit/rollback
> > > commands and they aren't really part of traversal construction. They
> are
> > > operations that execute on the graph itself and I couldn't think of how
> > to
> > > treat them as traversals nicely, so they sorta just exist as a one off.
> > > Perhaps they will grow in number?? Folks have always asked if bytecode
> > > requests could get "GraphFeature" information - I suppose this could
> be a
> > > way to do that??
> > >
> > > Anyway, I will keep going down this general path as it's appearing
> > > relatively fruitful.
> > >
> > >
> > >
> > >
> > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <spmallette@gmail.com
> >
> > > wrote:
> > >
> > > > > We should just communicate clearly that a simple remote traversal
> > > > already uses a transaction by default,
> > > >
> > > > That's a good point because even with this change we still have a
> > > > situation where a single iterated remote traversal will generally
> mean
> > a
> > > > server managed commit/rollback, but in embedded mode, we have an open
> > > > transaction (for transaction enabled graphs - TinkerGraph will behave
> > > more
> > > > like a remote traversal, so more confusion there i guess). I'm not
> sure
> > > how
> > > > to rectify that at this time except by way of documentation.
> > > >
> > > > The only thing I can think of is that the default for embedded would
> > have
> > > > to be auto-commit per traversal. In that way it would work like
> remote
> > > > traversals and for graphs that don't support transactions like
> > > TinkerGraph.
> > > > Of course, that's the kind of change that will break a lot of code.
> > Maybe
> > > > we just keep that idea for another version.
> > > >
> > > > On Thu, Mar 18, 2021 at 4:21 AM <fh...@florian-hockmann.de> wrote:
> > > >
> > > >> I like the idea of adding transactions for remote traversals as they
> > > >> close a gap in functionality that we currently have for remote
> > > traversals.
> > > >> We should just communicate clearly that a simple remote traversal
> > > already
> > > >> uses a transaction by default, meaning that the server will roll
> back
> > > any
> > > >> changes if any exception occurs. Users often ask about how to do
> > > >> transactions with a remote traversal because they don't know about
> > this
> > > and
> > > >> I'm afraid that we might add even more confusion if we now add
> > > transactions
> > > >> for remote traversals. Hence why I think that we should communicate
> > this
> > > >> clearly when we add remote transactions.
> > > >>
> > > >> Getting this to work in the GLVs should be possible but will require
> > > some
> > > >> effort. I think we would have to introduce some
> > > >> "DriverRemoteTransactionTraversal" that doesn't submit the traversal
> > on
> > > a
> > > >> terminal step but saves it and then submits all saved traversals
> > > together
> > > >> on close().
> > > >> This also means that we should directly add an async version of
> > close(),
> > > >> maybe closeAsync()? (Same for commit() and rollback())
> > > >>
> > > >> I also like create() better than traversal() because it would
> confuse
> > me
> > > >> to first start a traversal with traversal() and then also start a
> > > >> transaction with the same method.
> > > >>
> > > >> -----Ursprüngliche Nachricht-----
> > > >> Von: Stephen Mallette <sp...@gmail.com>
> > > >> Gesendet: Donnerstag, 18. März 2021 00:01
> > > >> An: dev@tinkerpop.apache.org
> > > >> Betreff: Re: [DSISCUSS] Remote Transactions
> > > >>
> > > >> One thing I should have noted about tx().create(). The create() very
> > > well
> > > >> could have been named traversal(), thus the previous example reading
> > as:
> > > >>
> > > >> g = traversal().withEmbedded(graph)
> > > >> // or
> > > >> g = traversal().withRemote(conn)
> > > >> gtx = g.tx().traversal()
> > > >> gtx.addV('person').iterate()
> > > >> gtx.addV('software').iterate()
> > > >> gtx.close() // alternatively you could explicitly commit() or
> > rollback()
> > > >>
> > > >> You're basically spawning a new GraphTraversalSource from the
> > > Transaction
> > > >> object rather than from a Graph or AnonymousTraversal. I chose
> > create()
> > > >> because it felt like this looked weird:
> > > >>
> > > >> g = traversal().withRemote(conn).tx().traversal()
> > > >>
> > > >> which would be a weird thing to do I guess, but just seeing
> > > "traversal()"
> > > >> over and over seemed odd looking and I wanted to differentiate with
> > the
> > > >> Transaction object even though the methods are identical in what
> they
> > > do. I
> > > >> suppose I also drew inspiration from:
> > > >>
> > > >> Transaction.createdThreadedTx()
> > > >>
> > > >> which I think we might consider deprecating. JanusGraph would simply
> > > >> expose their own Transaction object in the future that has that
> method
> > > as I
> > > >> imagine folks still use that feature. As far as I know, no other
> graph
> > > >> implements that functionality and my guess is that no one will
> likely
> > > do so
> > > >> in the future.
> > > >>
> > > >> anyway, if you like traversal() better than create() or the other
> way
> > > >> around, please let me know
> > > >>
> > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
> dave@bechberger.com
> > >
> > > >> wrote:
> > > >>
> > > >> > I am in favor of this change as I any idea that unifies the
> multiple
> > > >> > different ways things work in TP will only make it easier to learn
> > and
> > > >> > adopt.
> > > >> >
> > > >> > I don't know that I have enough knowledge of the inner workings of
> > > >> > transactions to know what/if this will cause problems.
> > > >> >
> > > >> > Dave
> > > >> >
> > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
> > > >> > <sp...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > We haven't touched "transactions" since TP3 was originally
> > designed.
> > > >> > > They have remained a feature for embedded use cases even in the
> > face
> > > >> > > of the
> > > >> > rise
> > > >> > > of remote graph use cases and result in major inconsistencies
> that
> > > >> > > really bother users.
> > > >> > >
> > > >> > > As we close on 3.5.0, I figured that perhaps there was a chance
> to
> > > >> > > do something radical about transactions. Basically, I'd like
> > > >> > > transactions to work irrespective of remote or embedded usage
> and
> > > >> > > for them to have the
> > > >> > same
> > > >> > > API when doing so. In mulling it over for the last day or so I
> > had a
> > > >> > > realization that makes me believe that the following is
> possible:
> > > >> > >
> > > >> > > g = traversal().withEmbedded(graph)
> > > >> > > // or
> > > >> > > g = traversal().withRemote(conn)
> > > >> > > gtx = g.tx().create()
> > > >> > > gtx.addV('person').iterate()
> > > >> > > gtx.addV('software').iterate()
> > > >> > > gtx.close() // alternatively you could explicitly commit() or
> > > >> > > rollback()
> > > >> > >
> > > >> > > // you could still use g for sessionless, but gtx is "done"
> after
> > //
> > > >> > > you close so you will need to create() a new gtx instance for a
> //
> > > >> > > fresh transaction assert 2 == g.V().count().next()
> > > >> > >
> > > >> > > Note that the create() method on tx() is the new bit of
> necessary
> > > >> syntax.
> > > >> > > Technically, it is a do-nothing for embedded mode (and you could
> > > >> > > skip it for thread-local auto-transactions) but all the
> > > >> > > documentation can be shifted so that the API is identical to
> > remote.
> > > >> > > The change would be non-breaking as the embedded transaction
> > > >> > > approach would remain as it is, but would no longer be
> documented
> > as
> > > >> > > the preferred approach. Perhaps we could one day disallow it but
> > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we want
> to
> > > >> touch in TP3.
> > > >> > >
> > > >> > > What happens behind the scenes of g.tx().create() is that in
> > remote
> > > >> > > cases the RemoteConnection constructs a remote Transaction
> > > >> > > implementation which basically is used to spawn new traversal
> > source
> > > >> > > instances with a session based connections. The remote
> Transaction
> > > >> > > object can't support
> > > >> > transaction
> > > >> > > listeners or special read-write/close configurations, but I
> don't
> > > >> > > think that's a problem as those things don't make a lot of sense
> > in
> > > >> > > remote use cases.
> > > >> > >
> > > >> > > From the server perspective, this change would also mean that
> > > >> > > sessions would have to accept bytecode and not just scripts,
> which
> > > >> > > technically shouldn't be a problem.
> > > >> > >
> > > >> > > One downside at the moment is that i'm thinking mostly in Java
> at
> > > >> > > the moment. I'm not sure how this all works in other variants
> just
> > > >> > > yet, but
> > > >> > I'd
> > > >> > > hope to keep similar syntax.
> > > >> > >
> > > >> > > I will keep experimenting tomorrow. If you have any thoughts on
> > the
> > > >> > matter,
> > > >> > > I'd be happy to hear them. Thanks!
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > >
> >
>