You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Stephen Mallette <sp...@gmail.com> on 2017/07/05 20:39:45 UTC

Re: [DISCUSS] Remaining Items for 3.3.0

There have been a number of pull requests lately that have removed
previously deprecated code for 3.3.0. While not deleting tons of code, it
is tidying up a bit which is good. I think we should try to remove just
about all deprecated code short of code deprecated for the Gremlin language
which would mean that old sack(), groupV3d0(), etc methods will still hang
around. I'm also hesitant to kill RemoteGraph as I'm not sure what we want
to do with that. It still feels like it might have utility. I would love to
get rid of our biggest deprecation offender in the TraversalEngine
 interface, but I"m not sure what that would entail (I'll consider that a
nice to have if we can get it done). As for the rest, I'll be creating
issue in JIRA tomorrow now that I have a list of everything else that can
go and then the pull requests can start flying...

On Fri, Jun 30, 2017 at 5:20 PM, Stephen Mallette <sp...@gmail.com>
wrote:

> Now that the Gryo 3.0 PR has been issued and is awaiting review, I've
> moved on to GraphSON 3.0 to hopefully resolve the following problems:
>
> https://issues.apache.org/jira/browse/TINKERPOP-1427
> https://issues.apache.org/jira/browse/TINKERPOP-1574
> https://issues.apache.org/jira/browse/TINKERPOP-1414
>
> Those three issues basically mean that GraphSON 3.0 will be the default
> for TinkerPop 3.3.0. There will be no such thing as "untyped" GraphSON
> anymore - that really doesn't make sense given the move toward
> bytecode/GLVs. That is further nice as it reduces a choice for users.
> Finally, this revised version of GraphSON will take care of "collections"
> that do not serialize well for Gremlin needs due to limitations of JSON
> itself. The best example of this deficiency is in maps where keys can't be
> anything but strings. In Gremlin it is a common pattern to do stuff like:
>
> g.V(1).out().groupCount()
>
> As it stands GLVs have no support for that result. Other than that, there
> are no big changes to GraphSON - it works pretty well and has improve
> dramatically in speed for 3.2.5 so it should be a good basis for GLVs to
> build from. That's about it - hoping to get all these items fixed up by end
> of next week and ready for a pull request.
>
>
>
> On Wed, Jun 28, 2017 at 8:23 AM, Stephen Mallette <sp...@gmail.com>
> wrote:
>
>> hahaha - i can't win. there was other work done on Gryo 3.0 a pretty long
>> time ago, that made it useful around Request/ResponseMessage serialization
>> in Gremlin Server. I'd long forgotten about that. I guess I will finish up
>> the work - the work just won't be harried by anything that had anything to
>> do with TINKERPOP-1592. It was almost there anyway I think.
>>
>> On Tue, Jun 27, 2017 at 3:02 PM, Stephen Mallette <sp...@gmail.com>
>> wrote:
>>
>>> You had me at the problem with multi-properties. withDetachment()
>>> doesn't seem to address that well and which makes this feel quite more
>>> hacky than when it started.
>>>
>>> I think not doing withDetachment() reduces the need and urgency to do
>>> Gryo 3.0. HaltedTraverserStrategy already works as does all the
>>> serialization that goes with it. The only issue that is messed up is that I
>>> need to "fix" TraversalMetrics serialization in Gryo 1.0, but we've never
>>> been super consistent about versioning Gryo anyway, so i wonder if that
>>> minor break matters on a version where we are allowing for breaks. I think
>>> i'll kill my efforts on Gryo 3.0 for now and save some code.
>>>
>>>
>>>
>>> On Tue, Jun 27, 2017 at 11:17 AM, Marko Rodriguez <ok...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In this email I will argue why TINKERPOP-1592 is a bad idea.
>>>>
>>>> GremlinServer does “too much.” In the future, the concept for
>>>> GremlinServer for TinkerPop4 should be “network I/O to the Gremlin virtual
>>>> machine.” That is it. So what is GremlinServer doing now that is bad?
>>>> Currently GremlinServer is “detaching” elements from the graph and
>>>> populating those elements with more data than the user requested. What do I
>>>> mean?:
>>>>
>>>> g.V(1)
>>>>
>>>> Currently that returns a DetachedVertex which is vertex 1’s id, label,
>>>> and all of its properties. Now here is the crazy part. What if there are 1M
>>>> properties (e.g. timestamped multi-properties on sensor network vertices).
>>>> Doh! However, what did the user ask for? They asked for v[1] and that is
>>>> all they should get. This is known as ReferenceVertex. If they wanted some
>>>> subset of the timestamped sensor network data, they should have done:
>>>>
>>>> g.V(1).properties(‘sensor’).has(‘timestamp’,gt(2017))
>>>>
>>>> Thus, we should only return the data people explicitly ask for in the
>>>> traversal. The TINKERPOP-1592 ticket is a hack for DetachedVertex by
>>>> allowing users to specify withDetachment(properties:[“not_sensor”]),
>>>> but then it is not expressive enough. Ultimately, for generality, users
>>>> will want to specify full traversals in their withDetachment()
>>>> specification. Now you are talking SubgraphStrategy. Dar! — and guess what,
>>>> GremlinServer doesn’t respect SubgraphStrategy. This is the problem with
>>>> everything NOT being traversal — once you start using the “Blueprints API”
>>>> you start getting inconsistent behavior/functionality. Thus, GremlinServer
>>>> does too much — just execute the traversal and return the result.
>>>>
>>>> Next, DetachedXXX starts to get I-N-S-A-N-E when you start talking
>>>> GLVs. Now we have the Blueprints API implements in C#, Python, etc.
>>>> Noooooooo! GLV’s should only implement ReferenceXXX which is the bare
>>>> minimum specification of a graph object such that it can be re-attached
>>>> (referenced back) to the source graph. Thats it. Don’t start populating it
>>>> with properties — “what about edges?” — “can it get the neighboring
>>>> vertices properties too?” — “what about ...?” — if you want that data, you
>>>> traverse to it yourself!
>>>>
>>>> So, what is the solution to the problem at hand — ReferenceXXX. These
>>>> element classes are the minimal amount of data required to re-attach to the
>>>> source graph. Thus,  if you do g.V(1), you get back id/label. However, if
>>>> you want to then get the sensor data, you do g.V(v1).properties(…).
>>>> Moreover, there is a little hidden gem called HaltedTraverserStrategy that
>>>> allows the user to specify their desired element class —
>>>> https://github.com/apache/tinkerpop/blob/master/gremlin-core
>>>> /src/main/java/org/apache/tinkerpop/gremlin/process/traversa
>>>> l/strategy/decoration/HaltedTraverserStrategy.java <
>>>> https://github.com/apache/tinkerpop/blob/master/gremlin-cor
>>>> e/src/main/java/org/apache/tinkerpop/gremlin/process/travers
>>>> al/strategy/decoration/HaltedTraverserStrategy.java>.
>>>>
>>>> If GremlinServer is yielding too much data, simply do this:
>>>>
>>>> g = g.withStrategy(HaltedTraverserStrategy.reference())
>>>> g.V(1) // ahh… fresh and clean.
>>>>
>>>> The trick to software is not to write it. If you are a software
>>>> developer, you are not as good as the guy who runs the deli down the street
>>>> cause guess what, he lives just fine and doesn’t write a lick of code.
>>>>
>>>> Marko.
>>>>
>>>> http://markorodriguez.com
>>>>
>>>>
>>>>
>>>> > On Jun 26, 2017, at 2:21 PM, Stephen Mallette <sp...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Looking back at this thread, I think that since there were no
>>>> objections to
>>>> > doing "pre-releases" of GLVs I think we can postpone the test suite
>>>> > changes. So, i'm fine with that being off the table.
>>>> >
>>>> > Looking at my list, I'm also surprised that I didn't include:
>>>> >
>>>> > https://issues.apache.org/jira/browse/TINKERPOP-1592
>>>> >
>>>> > I think that will be important for providing more flexibility to
>>>> users to
>>>> > shape results returned from traversals. That, of course, is important
>>>> for
>>>> > GLV remoting so that users can return only data that matters to them.
>>>> > TINKERPOP-1592 funnels into the GraphSON/Gryo 3.0 stuff mentioned
>>>> > previously as we seek to make improvements there in terms of
>>>> > efficiency/performance/usability. Marko will be taking a look at the
>>>> 1592
>>>> > ticket.
>>>> >
>>>> > I think there is a good body of nice-to-have tickets (after going
>>>> through
>>>> > them all in the last couple of weeks to do some housekeeping) so
>>>> we'll see
>>>> > what we can get in there and what we can't after those more crucial
>>>> bits
>>>> > are done. I believe that we could start thinking about release of
>>>> 3.3.0 in
>>>> > the next 4 weeks or so.
>>>> >
>>>> > If there are any other thoughts for what's going on with 3.3.0 please
>>>> let
>>>> > them be known.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Jun 1, 2017 at 2:08 PM, Stephen Mallette <
>>>> spmallette@gmail.com>
>>>> > wrote:
>>>> >
>>>> >> I was just thinking about what needs to be done for 3.3.0 to get it
>>>> ready
>>>> >> for release. I thought I'd send this email to collect any ideas:
>>>> >>
>>>> >> + Dynamically load the MetricManager in Gremlin Server
>>>> (TINKERPOP-1550)
>>>> >> + Clean up IO - both GraphSON 3.0 and Gryo 3.0
>>>> >> + Remove more deprecated code
>>>> >> + Test framework improvements (GLVs and in the structure/process
>>>> suites)
>>>> >>
>>>> >> I suppose these could shift and change between now and when we think
>>>> it's
>>>> >> ready to release. I have no idea how much time it will take to get
>>>> this all
>>>> >> done, but let's see if anyone else has any other important things
>>>> for 3.3.0.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>>
>>>>
>>>
>>
>