You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2021/10/22 22:35:14 UTC

[GitHub] [tinkerpop] rdtr opened a new pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

rdtr opened a new pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487


   …osal for TinkerPop


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mschmidt00 commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
mschmidt00 commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736970771



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.

Review comment:
       That's indeed where it gets interesting!
   
   Looking at with my RDF hat on, I'd say that the problem here is that PGs lack built-in support for globally unique IDs. Without global IDs, the question whether two vertices from different databases are equal can never be answered with certainty (even if they have the same property set) -- and I'd argue the same holds for inequality as well.
   
   * For instance is _V(id->1, fname->Jane, lname->Doe)_ in DB1 equal _V(_id->1, fname->Jane, lname->Doe)_ from DB2? Is it equal _V(_id->1, fname->Jane, lname->Doe)_? Imho, neither the first nor the latter is safe to conclude without some "external contractual agreement" (such as guaranteeing that IDs are shared across the two database instance, or constraints saying that fname+lname are a "globally accepted" key). 
   * On the other hand, is _V(id->1, fname->Jane, lname->Doe)_ in DB1 guaranteed different from _V(id->2, fname->Jane, lname->Doe, age -> 18)_? Even if there are conflicting property values, you might be able to unify the vertices when multi-valued properties are supported.
   
   Imho, it would be interesting to explore PG extensions alike global IDs as a future direction.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette edited a comment on pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
spmallette edited a comment on pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#issuecomment-951110110


   @rdtr  you're getting a failing the build smoke test because your file doesn't have the Apache license header. Just paste the following to the top of your document. 
   
   ```text
   ////
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
   
     http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   ////
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737662004



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?

Review comment:
       I think that's a good idea, and something developers will want. In Java, it could be as simple as allowing developers to provide a custom `Comparator<Value>` which overrides a default comparator -- but that default comparator is pretty important, IMO, and should be consistent with a well-defined ordering that extends across Gremlin language variants. The query language does need to be opinionated about equality and comparison (anywhere we have lookups or filtering), and there is a lot more flexibility in where and when expressions are evaluated (e.g. client/server) if we have that consistency.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rngcntr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rngcntr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737145083



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 

Review comment:
       The question which value should be filtered out has raised my attention. If, for instance, some query uses `dedup().by("name")` and there are multiple vertices with the same name, the server is free to choose which one it returns. This is similar to the [handling of GROUP BY in MySQL](https://docs.oracle.com/cd/E17952_01/mysql-5.6-en/group-by-handling.html).
   A server could, in theory, return different results for each run of the query even if the queried graph does not change. In their default implementation, steps like `dedup()` depend on the order in which traversers enter the step, but as far as I know, a consistent behavior is not enforced.
   
   I am aware that this might not necessarily be an issue. It's just that non-deterministic steps have quite an impact on formal proofs of correctness of optimizations, which is a topic I'm currently working on. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738666308



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?

Review comment:
       As Michael suggested, we probably should give graph providers a way of overriding the default behavior, but again I think we can define reasonable defaults which will work well for most providers. We can actually standardize how most unsupported types are mapped to supported ones. This has already been demonstrated in Hydra for [atomic values](https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/test/haskell/Hydra/Prototyping/Adapters/AtomicSpec.hs) and most [complex values](https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/test/haskell/Hydra/Prototyping/Adapters/TermSpec.hs). We need to enhance Graph.Features to give graph providers a way of specifying:
   - which atomic type constructors (e.g. binary, boolean, float, integer, string) are supported. Default to all.
   - which floating-point type constructors (e.g. 32-bit float, 64-bit double) are supported. Default to both.
   - which integer type constructors (e.g. 8 to 64-bit signed, unsigned) are supported. Perhaps default to {int32, int64}
   - any additional constraints, provided as a function Type->Boolean. Defaults to `\t -> true` (allow all).
   
   See [here](https://github.com/CategoricalData/hydra/blob/main/hydra-scala/src/gen-main/scala/hydra/adapter/Language.scala#L15) for an example of a language constraints class in Scala.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mschmidt00 commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
mschmidt00 commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736962713



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex

Review comment:
       Makes sense, and +1 for integrating the idea of different vertex types into a future type system.
   
   Put into that context, what we meant here by **Vertex** would be the union over all possible vertex types. Would you agree that, for the specific purpose of defining equality semantics, more specific vertex types are irrelevant?   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mschmidt00 commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
mschmidt00 commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736976829



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.
+    ** If the type is different, they are not equivalent.
+        *** +INF^^double is not equivalent to +INF^^float
+        *** NaN^^double is not equivalent to NaN^^float
+    ** 123 and 123.0 are equal but not equivalent to each other
+* -0.0, 0.0, and +0.0 are not equivalent to each other
+    ** -0.0 is equivalent to -0.0
+    ** 0.0 is equivalent to 0.0
+    ** +0.0 is equivalent to +0.0
+* -INF and +INF are not equivalent to each other
+    ** -INF is equivalent to -INF
+    ** +INF is equivalent to +INF
+    ** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
+* NaN is not equivalent to any other numbers
+    ** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
+
+===== NULL
+* NULL is not equivalent to any other values

Review comment:
       To provide some background: his was motivated by openCypher as well, which explicitly introduces VOID as the type for NULL (as a side note @rdtr: we should definitely not call it NULL, but use something like VOID or NULLTYPE if we want to stick for it for the purpose of this documentation). But I like the Optional variant, too.
   
   To make sure I understand what you're suggesting @joshsh (and I guess I get your Optional:Nothing example from the previous comment now): the idea would be to use, say Optional<String> to denote a property value that is either a string or NULL, and Optional<Nothing> to denote NULL only (where Nothing is the bottom type)? Essentially as an alternative to modeling this as a union type such as UNION(String, VOID), with an explicit type for VOID for NULLs? 
   
   Do you have any insights into trade-offs between these two options? In particular, I wonder if there any implications in terms of expressivity, complexity of type unification systems, etc.?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736897920



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.

Review comment:
       Thanks. Having the common terminology across PG languages would be better for sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738678439



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex

Review comment:
       Yes, I think we can proceed incrementally, and I think there is a common theme in the discussion in this PR: provide reasonable defaults which will work well for many graph providers, but allow developers to override those defaults if needed. I agree that moving to to strongly-typed vertices and edges will be a big step, and that we can start with strongly-typed properties.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738691697



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.

Review comment:
       Didn't have time to talk about TinkerPop in the meeting today (which was about labels vs. types), but I think we will set aside some time next week. It would be very good to align where we can, particularly around property types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736718809



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.
+    ** If the type is different, they are not equivalent.
+        *** +INF^^double is not equivalent to +INF^^float
+        *** NaN^^double is not equivalent to NaN^^float
+    ** 123 and 123.0 are equal but not equivalent to each other
+* -0.0, 0.0, and +0.0 are not equivalent to each other
+    ** -0.0 is equivalent to -0.0
+    ** 0.0 is equivalent to 0.0
+    ** +0.0 is equivalent to +0.0
+* -INF and +INF are not equivalent to each other
+    ** -INF is equivalent to -INF
+    ** +INF is equivalent to +INF
+    ** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
+* NaN is not equivalent to any other numbers
+    ** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
+
+===== NULL
+* NULL is not equivalent to any other values

Review comment:
       There are those who would agree with you, but I really think we ought to keep NULL out of the type system. We do need to deal with `null` values coming Java, but I would suggest we treat a `T` with value `null` as interchangeable with an `Optional<T>` with value `Optional.empty()`, so for properties which may or may not have a value, I would make `Optional<T>` the property's type, rather than `T`. I would use `T` for required properties.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737849128



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?

Review comment:
       I will add the description in the section of "Type Casting" that Graph provider must be able to specify which types are supported by the provider and how unsupported types are handled is also Graph provider dependent.
   
   If you think we should rule out a specific casting rule (e.g. if Graph Provider does not support BigDecimal it must be casted to Double), please let me know.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738667123



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?

Review comment:
       SGTM




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736712369



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.

Review comment:
       If I can say "element reference" and you can say "id" with no conflict, then there is no conflict. One thing to keep in mind is that if there is only on id type, and ids are references, then that means there is only one element type (and not many types as in APG). It might be worthwhile to distinguish between a VertexId and an EdgeId type for the first pass, so we can have distinct Vertex and Edge types (just one of each). In the future, we can then distinguish between different vertex types and different edge types, with different reference (id) types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736714237



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.

Review comment:
       I will bring this up with PGSWG meeting on Thursday.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737887386



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       I think for Semantics we should list them up as well 
   - int8, int16, int32, int64
   - uint8, uint16, uint32, uint64
   their support is also controlled by `Features` flag and we can say how unhandled types are treated is Graph provider dependent (or TinkerPop can define how it should behave when not supporting in this proposal doc, if you'd like)

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       I think for Semantics we should list them up as well 
   - int8, int16, int32, int64
   - uint8, uint16, uint32, uint64
   
   their support is also controlled by `Features` flag and we can say how unhandled types are treated is Graph provider dependent (or TinkerPop can define how it should behave when not supporting in this proposal doc, if you'd like)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette closed pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
spmallette closed pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737668670



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?

Review comment:
       Lost formatting. Should have been something like `Nothing :: optional<t>`. I put up the following generated API night as an illustration: [TinkerPop3 transitional API](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/package-summary.html). See also the Java sources [here](https://github.com/CategoricalData/hydra/tree/main/hydra-java/src/gen-main/java/hydra/ext/tinkerpop/v3). If you look at the [Type](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/Type.html) class, it has three concrete subclasses: one for atomic values, one for collections (including optionals), and one for element references. See the corresponding [Value](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/Value.html) class. At both the logical level and the (Java) implementation level, two Values are equal if and only iff they are both strings, both optional integers, etc. and you really shouldn't even try to co
 mpare them if they aren't. If they are both `Optional<T>`, then an `Optional.of()` will always compare as different than an `Optional.empty()` etc. More later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737863339



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.

Review comment:
       No disagreement here. The lack of global ids is one of the major pain points when companies try to use graph databases as EKGs. The solution is usually to make sure that ids are in fact globally unique rather than local to a particular dataset -- UUIDs and URNs are popular where I come from.
   
   In your second example, I would say that the vertices are distinct both by value and by reference, but that yes there may be a mapping from the two vertices to some other, reconciled vertex.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738669897



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       Yes, these together with `bigint` (or whatever we want to call arbitrary-precision integers). For floats, I have suggested `float32` (float), `float64` (double) and `bigfloat`. See above for comments on handling unsupported types; I think we should give graph providers the means to define their own strategies for handling unsupported types, but there is a lot we can do automatically based on simple, declarative descriptions of language constraints, to be folded into Graph.Features.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r734900729



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?

Review comment:
       I suggest not supporting NULL as a value to which we can apply eq. Optionals (instances of 1+t for some type t) are a cleaner alternative. The expression Nothing:optional<t> == Nothing:optional<t> is well defined, and evaluates to true.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       All of these are still good. For the sake of generalizing the type system away from Java, I suggest a slightly richer set of integer types, e.g. int8, int16, int32, int64 as well as bigint, and possibly unsigned integer types uint8, uint16, uint32, uint64 as well. Booleans, character strings, floats are fine as they are IMO. Binary strings might be a reasonable addition, as well.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?

Review comment:
       It is being treated as one now, at least in Graph.Features IIRC. There is a fine line to be drawn between primitive types and "sugar" like UUIDs, Dates, etc. which can be considered as aliases for strings, records, etc. with certain constraints.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)

Review comment:
       Sure, or we define a `Pair` type constructor.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path

Review comment:
       Path definitely can be a type.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex

Review comment:
       I think of these more as type constructors, because there are many different vertex types depending on the properties which are supported on a vertex, many different edge types depending both on the out- and in-vertex types and the edge properties which are supported, etc. For an in-depth exploration of this topic, see the Algebraic Property Graphs paper.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?

Review comment:
       Yes, but we should use Graph.Features to allow the provider to specify whether a given primitive type is supported.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID

Review comment:
       Id perhaps shouldn't be thought of as a type, but rather a special field on an element which can any of a number of types.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.

Review comment:
       This gets interesting. I would say that two vertex *references* are equal if and only if they refer to a vertex with the same id. However, two *vertices* with the same id but different properties are unequal. Two such vertices would not be allowed to exist in the same graph, but comparison operations can crop up in other contexts.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 

Review comment:
       We probably want a notion of information-preserving vs. non-information preserving mappings. If you are mapping expressions from a context in which bigints are supported into one where they are not, you have a couple of options: force bigint values into bounded integer values, possibly with loss of information, or map them into other types like strings in an information-preserving way using a well-understood encoding. 

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE

Review comment:
       Again, for now I would just say two lists are equal if they have the same type, and are exactly the same value expression. That subsumes considerations of list length etc.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.

Review comment:
       Again, with all of these, I would insist on a *type* along with the value. So a property has a type which is given by the property key as well as a type for property values. If the types of two properties are not the same, then you can not even compare them. If they are the same, then the properties are equal iff the property values are equal.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.

Review comment:
       I would generalize this to say that we have a grammar for values which includes numbers, and that for the purpose of comparison, a value is always accompanied by a type (where we also have a grammar for types). If (t1, v1) is exactly the same pair of expressions as (t2, v2), then the typed values are equal. Otherwise, they are not (unless we support subtyping etc.).

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.

Review comment:
       I might have defined the terms "equal" and "equivalent" exactly opposite, with equality being the more strict notion and equivalence allowing for relationships among types and values.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:

Review comment:
       An alternative to built-in casting, as part of the type system itself, is to have implicit `castIntegerToBigInteger`, `castFloatToBigDecimal` etc. functions which match on any integer, any float, and produce values of a single type. You can't compare `5:int32` and `5:int64`, but you can compare `toBigInteger(5:int32)` and `toBigInteger(5:int64)` if we treat the types in those expressions as tagged unions.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting

Review comment:
       This looks reasonable, though I suggest exploring a type system first in which there is no typecasting and no subtyping. These can be supported later to the extent that they add value without adding too much complexity.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.
+    ** If the type is different, they are not equivalent.
+        *** +INF^^double is not equivalent to +INF^^float
+        *** NaN^^double is not equivalent to NaN^^float
+    ** 123 and 123.0 are equal but not equivalent to each other
+* -0.0, 0.0, and +0.0 are not equivalent to each other
+    ** -0.0 is equivalent to -0.0
+    ** 0.0 is equivalent to 0.0
+    ** +0.0 is equivalent to +0.0
+* -INF and +INF are not equivalent to each other
+    ** -INF is equivalent to -INF
+    ** +INF is equivalent to +INF
+    ** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
+* NaN is not equivalent to any other numbers
+    ** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
+
+===== NULL
+* NULL is not equivalent to any other values

Review comment:
       I would just leave NULL as a Java construct, and not import it into the type system at all.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List

Review comment:
       Again, collections are really type constructors rather than types, in my point of view. E.g. `List<String>` is a type. If we support parametric polymorphism, then `List<A>` is a type. `List` is not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736403207



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       Generalising the type system away from Java is a worthy goal to pursue but I would suggest to scope the discussion in two parts: semantics for current implementation of TP as of 3.5.x and future looking semantics for TP.
   
   If we consider the current implementation itself which uses Java semantics, then what is proposed here is sufficient.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID

Review comment:
       The concept of Id as a type comes into consideration when we start defining the semantics of Gremlin language.
   e.g. consider the query, `g.V().id()`. If we have Id as a first class type, the Output relation for the id() step can be defined as type `Id` whereas if we consider it as a field on type `Vertex` then it would be difficult to model a contract for this step. 
   
   > which can any of a number of types.
   
   This is still true since ID is a composite type, more specifically it is a union of some base data type such as string, int etc.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       For future scope discussions, I agree that we need a richer set of numeric types. But even for that I would argue that we should start with a small set which can be implemented in majority of languages and then later on expand specialised types which might not be available in languages. The Graph.Features could be expanded to specify whatever types are supported by a Graph vendor. 

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.

Review comment:
       > If (t1, v1) is exactly the same pair of expressions as (t2, v2), then the typed values are equal. Otherwise, they are not (unless we support subtyping etc.).
   
   But this is not true for numerics where comparison is done using type promotion. As an example:
   ("key",1L) vs. ("key,1.0F) are equal but they have different typed values (long vs float).

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting

Review comment:
       Type casting is a very valid use case for users. Since we don't have an explicit schema in PG used by TinkerPop, users do not know what is the type of data stored. The expectation that user will always send queries with strict typing is not true today. Even if we had explicit schema I would still argue for a case where customers are allowed to query with type castable values. 

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.
+    ** If the type is different, they are not equivalent.
+        *** +INF^^double is not equivalent to +INF^^float
+        *** NaN^^double is not equivalent to NaN^^float
+    ** 123 and 123.0 are equal but not equivalent to each other
+* -0.0, 0.0, and +0.0 are not equivalent to each other
+    ** -0.0 is equivalent to -0.0
+    ** 0.0 is equivalent to 0.0
+    ** +0.0 is equivalent to +0.0
+* -INF and +INF are not equivalent to each other
+    ** -INF is equivalent to -INF
+    ** +INF is equivalent to +INF
+    ** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
+* NaN is not equivalent to any other numbers
+    ** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
+
+===== NULL
+* NULL is not equivalent to any other values

Review comment:
       We do need a primitive type to define null since it is a valid use case for users to have null values properties. We can rename it as undef or something similar which is language agnostic but I don't fully understand why you propose to leave it out of type system?

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.

Review comment:
       This is indeed interesting. Same argument for labels too. I agree with your definition but in today's TinkerPop's code implementation we only check for IDs for equality. Again, coming back to semantics for what exists vs. future looking, we can keep the semantics for 3.5.x and have a more precise definition as you suggested in future versions.

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 

Review comment:
       +1 This could be another knob in Graph.Features

##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.

Review comment:
       We took inspiration from OpenCypher's usage of this terminology. For the sake of consistency, perhaps we can continue with this? Otherwise, it would be difficult for users to understand when they deal with other PG languages. 
   
   Are you aware of the terminology being used in upcoming standards? Maybe we can align our terminology with that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736707645



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer

Review comment:
       I think it's OK to have types which will not be supported in every language, so long as we are able to map values between types in a general-purpose way. E.g. look at the [integer types](https://github.com/CategoricalData/hydra/blob/main/hydra-scala/src/gen-main/scala/hydra/core/IntegerType.scala) I am prototyping in Hydra. Not every language will implement types like `int16` (short) or `uint8` (byte), but it is straightforward to map an instance of `int16` to `int32` and back again to accommodate languages which don't. For types like timestamps, UUIDs, etc. yes I would start out with as few as practical.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738772921



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?

Review comment:
       For now, I am updating this equality definition for Map as below:
   
   > Two maps are equal when a Set of key-value pairs from those 2 maps are equal to each other. A key-value pair is equal to another pair if and only if both its key and value are equal to each other.
   
   Somehow the equality for Set is missing, so I am adding it as well
   
   >  Two sets are equal if they contain the same (equal to each other) elements.
   
   Let me know if it is not align to what you are suggesting.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rngcntr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rngcntr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737145083



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 

Review comment:
       The question which value should be filtered out has raised my attention. If, for instance, some query uses `dedup().by("name")` and there are multiple vertices with the same name, the server is free to choose which one it returns. This is similar to the [handling of GROUP BY in MySQL](https://docs.oracle.com/cd/E17952_01/mysql-5.6-en/group-by-handling.html).
   A server could, in theory, return different results for each run of the query even if the queried graph does not change (same goes for, like, g.V().limit(1)). In their default implementation, steps like `dedup()` depend on the order in which traversers enter the step, but as far as I know, a consistent behavior is not enforced.
   
   I am aware that this might not necessarily be an issue. It's just that non-deterministic steps have quite an impact on formal proofs of correctness of optimizations, which is a topic I'm currently working on. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rngcntr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rngcntr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737145083



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 

Review comment:
       The question which value should be filtered out has raised my attention. If, for instance, some query uses `dedup().by("name")` and there are multiple vertices with the same name, the server is free to choose which one it returns. This is similar to the [handling of GROUP BY in MySQL](https://docs.oracle.com/cd/E17952_01/mysql-5.6-en/group-by-handling.html).
   A server could, in theory, return different results for each run of the query even if the queried graph does not change (same goes for, like, `g.V().limit(1)`). In their default implementation, steps like `dedup()` depend on the order in which traversers enter the step, but as far as I know, a consistent behavior is not enforced.
   
   I am aware that this might not necessarily be an issue. It's just that non-deterministic steps have quite an impact on formal proofs of correctness of optimizations, which is a topic I'm currently working on. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#issuecomment-952845654


   As an administrative point, I think I'll call out something @divijvaidya wrote:
   
   > I would suggest to scope the discussion in two parts: semantics for current implementation of TP as of 3.5.x and future looking semantics for TP.
   
   I'd agree with that statement and think that we should take the portions of this document which are representative of how things currently work and get them added to [Provider Documentation](https://github.com/apache/tinkerpop/blob/master/docs/src/dev/provider/index.asciidoc) as the start of deeper reference material on Gremlin. Then we could get the open questions/proposed changes moved to a revised version of the old ["future" doc](https://github.com/apache/tinkerpop/blob/master/docs/src/dev/provider/index.asciidoc). 
   
   If that make sense to all, I'm ok to merge this as it is (with any additional changes required given the comments) with the idea that we'd move in that direction going forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737858944



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex

Review comment:
       There are (at least) two ways to go for a "transitional" TinkerPop 3 data model. Probably the simplest is just to have a `Vertex` type and an `Edge` type, where the instances of `Vertex` are all possible vertices as you say. It's not a big step from there to specific vertex and edge types, though. E.g. in the API sketch I mentioned above, [VertexType](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/VertexType.html) is defined by an id type (e.g. string, 32-bit int, etc.) combined with a set of property types. A [VertexIdType](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/VertexIdType.html) represents references to vertices *of a particular type*. This would be a purely structural approach to element types, but it would allow us to do some very useful things, like inferring based on the type of an edge, what properties to expect of the out- and in-vertices of the edge, including what the datatype of those pro
 perties are. If you know that `knows` connects a `Person` with a `Person`, and you know that `worksAt` connects a `Person` with a `Company`, then you know that `v.out('knows').out('worksAt')` consumes `Person` vertices and produces `Company` vertices, etc. Which goes beyond equality or comparison, except that if we are headed in that direction, we should just make sure that elements of different "types" compare as unequal. If types are associated with labels, and elements with different labels are always unequal, then that's enough.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mschmidt00 commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
mschmidt00 commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736957101



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?

Review comment:
       The alternative would be to leave a little more flexibility for implementations, and just define (non-exhaustive) contracts that implementations must satisfy when comparing maps (for instance, two maps cannot compare equal if the unordered set of key-value pairs compares non-equal). TP builds internally on LinkedHashMaps, which would make the above a good match -- but other implementations may differ and use unordered representations. The question here is how critical the order within a map is to key use cases of the query language -- any thoughts on that @joshsh? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738800352



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.
+    ** If the type is different, they are not equivalent.
+        *** +INF^^double is not equivalent to +INF^^float
+        *** NaN^^double is not equivalent to NaN^^float
+    ** 123 and 123.0 are equal but not equivalent to each other
+* -0.0, 0.0, and +0.0 are not equivalent to each other
+    ** -0.0 is equivalent to -0.0
+    ** 0.0 is equivalent to 0.0
+    ** +0.0 is equivalent to +0.0
+* -INF and +INF are not equivalent to each other
+    ** -INF is equivalent to -INF
+    ** +INF is equivalent to +INF
+    ** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
+* NaN is not equivalent to any other numbers
+    ** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
+
+===== NULL
+* NULL is not equivalent to any other values

Review comment:
       I changed NULL -> NULLTYPE to avoid confusion. Let me know if the description doesn't align what we intend here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737867439



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE
+* When both are List, then
+    ** if the size of them are different, return FALSE
+    ** L(n) denotes n-th element in list L. 
+        *** For 2 lists L1 and L2 to be equal (L1 is equal to L2), for all 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return TRUE.
+        *** For 2 lists L1 and L2 to be not equal (L1 eq L2 returns FALSE), for any 0 <= x < n (n is length of L1 and L2) L1(n) eq L2(n) must return FALSE.
+
+===== Map
+
+* If either one of LHS or RHS is Map and another isn't, return FALSE
+* For 2 Maps M1 and M2 to be equal,
+    ** All keys in M1 should be within keys in M2
+    ** All keys in M2 should be within keys in M1
+    ** M1 and M2 should have the same number of  keys
+    ** For all keys k(1), k(2), ...k(n) in M1, M1[k] eq M2[k] should return TRUE
+    ** In Gremlin key order is not respected when determining equality
+
+=== Equivalence
+
+Equivalence is identical to Equality, except for the cases listed below.
+
+==== Primitive types
+===== Number
+
+* Unlike Equality, we *don't do* type casting for Equivalence.
+    ** If the type is different, they are not equivalent.
+        *** +INF^^double is not equivalent to +INF^^float
+        *** NaN^^double is not equivalent to NaN^^float
+    ** 123 and 123.0 are equal but not equivalent to each other
+* -0.0, 0.0, and +0.0 are not equivalent to each other
+    ** -0.0 is equivalent to -0.0
+    ** 0.0 is equivalent to 0.0
+    ** +0.0 is equivalent to +0.0
+* -INF and +INF are not equivalent to each other
+    ** -INF is equivalent to -INF
+    ** +INF is equivalent to +INF
+    ** They are equialavlent to each other irrespective to its underlying type, so in Java, for example, Double.POSITIVE_INFINITY is equivalent to Float.POSITIVE_INFINITY.
+* NaN is not equivalent to any other numbers
+    ** NaN *is equivalent to* NaN irrespective to its underlying type, so in Java, for example, Double.NaN is equivalent to Float.NaN.
+
+===== NULL
+* NULL is not equivalent to any other values

Review comment:
       Yes, essentially that. If you are expecting a value of type `Optional<T>` and you get a `null` in Java, you might as well consider it as an `Optional.empty()` value instead. If you are expecting `T` (where `T` is not itself an optional type) and you get `null` in Java, then you have an integrity violation. So in future iteration if we have a `VertexType` which specifies a `String` value for the "name" property key, we expect that there will in fact be a "name" string for every vertex of that type; a vertex without a "name" is not valid. On the other hand, if the type specifies `Optional<String>` for "name", and we find an `Optional.empty()`, that's fine, and so is `null`, and so is a "missing" value for "name". This is another thing we have had long discussions about in the PGSWG -- whether an absent property value is the same as a void/unit/null-valued property -- but I feel that unifying "missing" with "null/void" keeps things simple.
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737884875



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?

Review comment:
       OK UUID should be treated as an independent type. 
   - The equality and comparability check depends on its String representation
   - `UUID("123e4567-e89b-12d3-a456-426614174000")` and String `"123e4567-e89b-12d3-a456-426614174000"` are not equivalent. 
   - UUID needs to have its own type hierarchy when ordered, apart from String (I will put them after String)
   
   Let me know if you want to suggest in other way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r738758210



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.
+
+===== Property
+
+If key and value are same, 2 properties are equal.
+
+===== PropertyKey
+
+key is String type so Equality for String type applies.
+
+===== PropertyValue
+
+Any type, so Equality for a corresponding type applies.
+
+===== ID
+
+Any type, so Equality for a corresponding type applies.
+
+===== Label
+
+label is String type so Equality for String type applies.
+
+===== Path
+
+2 Paths are equal when their path elements are equal (using equality of List), and the corresponding path labels are also equal. 
+
+===== List
+
+* If either one of LHS or RHS is List and another isn't, return FALSE

Review comment:
       I updated the description as follows
   > Two lists are equal if they contain the same (equal to each other) elements in the same order.
   
   I propose to keep the type definition as `List` being simply `List` in this semantics. We can extend this definition when we introduce type system as you are referring in multiple comments. I don't think that changes the behavior of equality / comparison etc. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r734900132



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?

Review comment:
       Yes, that is what I would suggest: that a map is treated as an ordered list of pairs of type k x v, where k is the key type of the map, and v is the value type. Maps themselves should be comparable and orderable to the extent that their key and value types are.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736712369



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex
+* Edge
+* VertexProperty
+* Property
+    ** Edge property
+    ** Vertex meta property
+* PropertyKey
+* PropertyValue
+* Label
+* ID
+* Path
+* List
+* Map
+* Set / BulkSet
+* Map.Entry (obtained from unfolding a Map)
+
+=== Type Casting
+
+We do type casting a.k.a type promotion for Numbers. Numbers are  Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal. Here is the rule how types are promoted:
+
+* If at least one is BigDecimal then compare as BigDecimal
+* If at least one is BigInteger then compare as BigInteger
+* If at least one is Double then compare as Double
+* If one of them is a Float, then convert both to floating type of highest common bit denomination
+  ** If another value is Long or Double, we need 64bit so convert both to Double 
+  ** Otherwise convert both to Float
+* If at least one is Long then compare as Long
+* If at least one is Integer then compare as Integer
+* If at least one is Short then compare as Short
+* If at least one is Byte then compare as Byte
+
+BigDecimal and BigInteger may not be supported depending on the language and Storage, therefore the behavior of type casting for these 2 types can vary depending on a Graph provider. 
+
+=== Equality
+
+==== Primitive types
+===== Number
+
+Number consists of Byte, Short, Integer, Long, Float, Double, BigInteger, and BigDecimal.
+
+* If either one of LHS or RHS is Number and another isn't, eq returns FALSE.
+* If both LHS and RHS are Number, it follows the type casting described above and then check the equality.
+* Example for edge cases:
+    ** -0.0 eq 0.0  = TRUE
+    ** +0.0 eq 0.0 = TRUE
+    **  -0.0 eq +0.0 = TRUE
+    ** NaN eq NaN  = FALSE
+    ** +INF eq +INF = TRUE
+    **  -INF eq -INF = TRUE
+    **  -INF eq +INF = FALSE
+* TinkerPop is JVM based so there can be ±INF^^float and ±INF^^double, NaN^^float and NaN^^double. They also adhere the type promotion.
+
+===== Boolean
+
+* If either one of LHS or RHS is Boolean and another isn't, return FALSE
+* True != False, True == True, False == False
+
+===== String
+
+* If either one of LHS or RHS is String and another isn't, return FALSE
+* We assume the common graphical order over unicode strings.
+* LHS and RHS needs to be lexicographically equal for LHS eq RHS == TRUE for String.
+
+===== Date
+
+* If either one of LHS or RHS is Date and another isn't, return FALSE
+* LHS eq RHS == TRUE when both LHS and RHS value are numerically identical in Unix Epoch time.
+
+===== NULL
+
+* If either one of LHS or RHS is null and another isn't, return FALSE
+* If both LHS and RHS are null, return TRUE 
+
+==== Composite types
+
+For all of them, if LHS and RHS is not of the same data type, equality returns FALSE. The following semantics applied when both LHS and RHS has the data type.
+
+===== Vertex / Edge / VertexProperty
+
+They are considered as Element family in TinkerPop and if 2 elements have the same type and have the same ID, they are considered as equal.

Review comment:
       If I can say "element reference" and you can say "id" with no conflict, then there is no conflict. One thing to keep in mind is that if there is only one id type, and ids are references, then that means there is only one element type (and not many types as in APG). It might be worthwhile to distinguish between a VertexId and an EdgeId type for the first pass, so we can have distinct Vertex and Edge types (just one of each). In the future, we can then distinguish between different vertex types and different edge types, with different reference (id) types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] rdtr commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
rdtr commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737890362



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?
+  ** If it is true, it may be respected in JOIN operation
+* There are a number of situations where the Gremlin grammar won’t support some of the examples - to what extent do these sorts of constructs need to exist in the grammar? Not having them would impact the ability to supply tests that enforce the behaviors that we’ve outlined. 
+* Should UUID be a different type to be taken into account ?
+
+== Technical Appendix
+
+=== Types
+First we need to define which data types the TinkerPop query execution runtime needs to handle. It is JVM based so as a primitive type, we are using the following types:
+
+* Byte: 8-bit signed two's complement integer
+* Boolean: true or false
+* Short: 16-bit signed two's complement integer
+* Integer: 32-bit signed two's complement integer.
+* Long: 64-bit signed two's complement integer.
+* Float: https://en.wikipedia.org/wiki/Single-precision_floating-point_format[single-precision 32-bit IEEE 754 floating point]
+* Double: https://en.wikipedia.org/wiki/Double-precision_floating-point_format[double-precision 64-bit IEEE 754 floating point]
+* BigInteger
+* BigDecimal
+* String / Char
+* UUID (String based equality / comparison, so identical to String)
+* Date
+
+Note that in Double or Float, we have a concept of INFINITY / https://en.wikipedia.org/wiki/Signed_zero[signed-zero], and NaN.
+In addition to these, there are composite types as follows:
+
+* Vertex

Review comment:
       It is an interesting idea.
   However defining a type of Vertex depending on its id / properties may affect Graph providers' architecture if they already have their own query layer / Storage implementation with the simplest approach you referring here.
   
   I think we should leave the option for Graph providers to stick with the simplest approach where we have a `Vertex`, `Edge` type etc for all vertices and edges respectively.
   
   Can we keep the current "simplest" types in this doc ? I feel it would be a big leap if we want to introduce the new type system altogether with this semantics. As @divijvaidya told, we can start from a small change and then evolve the semantics itself later with the new type system later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] joshsh commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
joshsh commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r737668670



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?

Review comment:
       Lost formatting. Should have been something like `Nothing :: optional<t>`. I put up the following generated API last night as an illustration: [TinkerPop3 transitional API](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/package-summary.html). See also the Java sources [here](https://github.com/CategoricalData/hydra/tree/main/hydra-java/src/gen-main/java/hydra/ext/tinkerpop/v3). If you look at the [Type](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/Type.html) class, it has three concrete subclasses: one for atomic values, one for collections (including optionals), and one for element references. See the corresponding [Value](https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/ext/tinkerpop/v3/Value.html) class. At both the logical level and the (Java) implementation level, two Values are equal if and only iff they are both strings, both optional integers, etc. and you really shouldn't even try 
 to compare them if they aren't. If they are both `Optional<T>`, then an `Optional.of()` will always compare as different than an `Optional.empty()` etc. More later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#issuecomment-953939823


   I've restructured the "future" doc to a form that matches what I described in my last comment:
   
   https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/index.asciidoc
   
   After this PR is merged it can be formatted into that space and the provider documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mschmidt00 commented on a change in pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
mschmidt00 commented on a change in pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#discussion_r736958544



##########
File path: docs/src/dev/future/equality_proposal.asciidoc
##########
@@ -0,0 +1,682 @@
+= Proposal for equality, equivalance, comparability and orderability semantics for TinkerPop
+
+== Motivation
+
+How values compare to each other is crucial to the behavior of a query language. While comparison semantics may sound like a trivial question at first, when looking under the surface many interesting questions arise, including aspects around equality and comparability in the context of type casting (e.g., over numerics of different types), slightly different variants of equality being used in different context (e.g. predicates vs. deduplication), questions around comparability and ordering across different logical types, as well as around the identity of elements (such as vertex and edge properties). 
+
+TinkerPop / Gremlin is written in and (partially) relies upon Java / JVM, and there is no clear semantics defined and published for the different types of equality and comparison operations as of today. Rather, what equals what and how values compare is often implicitly defined by the semantics of the underlying Java data structures that are being used, and hence may vary from context to context. We believe that a concise definition of these concepts is critical for both TinkerPop customers — who need to be able to reason about the outcome of their queries — as well as custom implementations of the TinkerPop API, who would benefit from a concise definition to follow. Therefore, TinkerPop should provide a complete and cohesive semantics for equality / comparison such that all Graph providers can easily ensure that their query processing approach aligns with the TinkerPop implementation. Helping customers and implementers alike, this will help increase the adoption of Gremlin as a
  query language. 
+
+This documentation is a proposal that shall serve as a basis for a community discussion on how TinkerPop should handle equality / comparison in different contexts. Motivated by different examples of the status today, we formalize different notions of equality and comparability and describe the contexts in which they apply. While the semantics that we propose is largely aligned with the semantics that is implemented in TinkerPop today, this proposal aims to fill in some existing gaps (such as providing a complete, cross-datatype ordering instead of throwing exceptions) and proposes modifications for a few edge cases, as to make the overall semantics more predictable, coherent, and documentable.    
+
+=== Examples
+
+Below are a couple of example scenarios where defining semantics can help clarify and mitigate inconsistent / undefined behavior in TinkerPop today:
+
+==== The underspecified/undocumented behavior
+
+Consider an equality check such as 
+
+[source]
+----
+gremlin> g.V().has(id, 19)
+----
+
+Without a precise definition, both users and Graph providers don't know whether this query matches only nodes with an ID that is exactly equal to the integer value 19 or, for instance, all numerical values that cast to an Integer value 19. To see that, right now, they need to dig into the TinkerPop code base. While, in the above example, type casting rules apply, in other cases such as
+
+[source]
+----
+gremlin> g.V().property(numericValue).dedup()
+----
+
+the two values above would always be treated as different entities.
+
+==== The behavior that is inherently driven by Java:
+
+Another example is equality over composite type.
+
+[source]
+----
+gremlin> g.V().aggregate("a").out().aggregate("b").cap("a").where(eq("b"))
+----
+
+This query compares two BulkSet objects produced by cap-Step. But the comparison is Java dependent and we don’t have a clear definition of how the comparison works for this kind of types.
+Same as Map, e.g.
+
+[source]
+----
+gremlin> g.V().group().unfold().as("a").V().group().unfold().as("b").where(eq("a", "b"))
+----
+
+and even we have comparison over Map.Entry which is Java dependent type.
+
+[source]
+----
+gremlin> g.V().group().unfold().order() 
+class java.util.HashMap$Node cannot be cast to class java.lang.Comparable (java.util.HashMap$Node and java.lang.Comparable are in module java.base of loader 'bootstrap')
+----
+
+==== Potentially unexpected results due to incompleteness
+
+A query which tries to determine the order across multiple types fails today. 
+
+[source]
+----
+// Propertis have values of Integer and String.
+gremlin> g.V().values("some property").order()
+
+class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
+
+// This query aims to order a heterogeneous result set
+gremlin> g.V().union(out(), outE()).order()
+class org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to class java.lang.Comparable (org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex is in unnamed module of loader 'app'; java.lang.Comparable is in module java.base of loader 'bootstrap')
+----
+
+It would be more helpful for users to define the complete order across types and returns a result instead of throwing an Exception.
+
+==== Inconsistencies results
+
+Handling for `NaN`, `NULL`, `+0.0`, `-0.0`, `+INF`, `-INF` is tricky, and TinkerPop does not cover all cases consistently at this moment.
+
+[source]
+----
+gremlin> g.V("1").properties("key")
+==>vp[key→0]
+
+// NaN == 0 holds true for this equality check.  
+gremlin> g.V("1").has("key", Double.NaN)
+==>v[1]
+
+gremlin> g.V("1").properties()
+==>vp[key→Infinity]
+
+// 0.0 is interepreted as BigDecial in Groovy and it tries to promote Infinity to BigDecimal as well,
+// then the type casting fails. This is observed when using Java11.
+gremlin> g.V("1").has("key", gt(0.0))
+Character I is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+
+In the next section, we provide a conceptual proposal to define concepts around how values compare and are ordered, which aims to provide an answer to these and other questions. We seek the feedback from the community to discuss and reach a consensus around the proposal and are open to all other ideas around how these concepts should be defined in TinkerPop / Gremlin.
+
+== Conceptualization of Equality and Comparison
+
+In the above section we used the notions of "equality" and "comparison" in a generalized way. Inspired by the formalization in https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf[the openCypher specification], we now refine these two notions into four, where we distinguish between equality vs. equivalence and comparability vs. orderability, which constitute two flavors of these concepts tailored to their usage in different concepts.  We summarize and contrast these concepts in the following subsections; more technical details and discussion of edge cases can be found in the technical appendix.
+
+=== Proposed semantics
+
+==== Equality vs. Equivalence
+
+Equality defines when two values are considered equal in the context of database lookups and predicates, while  equivalence defines value collation semantics in the context of, for instance, deduplication. For instance, equivalence over two values `a := Double.NaN` and `b:= Double.NaN` is true, but equality would (in our proposal) be defined as false; the rational here (which is commonly found in query and programming languages) is that comparing two "unknown" numbers — which is a frequent use case for NaN, cannot certainly be identified as equal in comparison, but it typically makes sense to group them together in, for instance, aggregations. 
+
+Both equality and equivalence can be understood as complete, i.e. the result of equality and equivalence checks is always either TRUE or FALSE (in particular, it never returns NULL or throws an exception). The details on equality and equivalence are sketched in the following two subsections, respectively.
+
+===== Equality 
+
+* Used by equality and membership predicates (such as https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L130[P.eq], https://github.com/apache/tinkerpop/blob/734f4a8745e797f794c4860962912b04313f312a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/P.java#L139[P.neq], and the list membership https://github.com/apache/tinkerpop/blob/72be3549a5e4f99115e9d491e0fc051fff77998a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Contains.java#L52[P.within]) in Gremlin. When this eq operator returns TRUE for 2 values (LHS and RHS), by definition LHS and RHS are equal to each other.
+
+* If graph providers need join semantics in query execution, equality should be used to join data over join keys. +
+Example:
+
+[code]
+----
+// equality over 2 ids
+gremlin> g.V().has(id, "some id")
+// equality over vertices
+gremlin> g.V().as("v").out().out().where(eq("v"))
+----
+
+* Equality adheres to type promotion semantics for numerical values, i.e. equality holds for values of different numerical type if they cast into the exactly same same value of the lowest common super type.
+* Other than the type promotion between Numbers, 2 values of different type are always regarded as not equal.
+* Equality checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error. Detailed behavior is described in
+
+===== Equivalence
+
+* Equivalence defines how TinkerPop deals with 2 values to be grouped or de-duplicated. Specifically it is necessary for the dedup and group steps in Gremlin. +
+Example:
+
+[code]
+----
+// deduplication needs equivalence over 2 property values
+gremlin> g.V().dedup().by("name")
+// grouping by equivalence over 2 property values
+gremlin> g.V().group().by("age") 
+----
+
+* Equivalence ignores type promotion semantics, i.e. two values of different types (e.g. 2^^int vs. 2.0^^float) are always considered to be non-equivalent. (There is an open question whether equivalence takes type promotion into account). +
+
+* For Number, 
+** Because type promotion is not effective, if the types are different then two numbers are never equivalent
+** NaN is not equal to NaN, but equivalent to each other
+
+* Other than the edge case around NaN (and, as of today, Numbers), equivalence in TinkerPop is identical to equality.
+* Like equality, equivalence checks always return TRUE or FALSE. They never result in NULL output, undefined behavior, nor do they ever throw an error.
+
+==== Comparability vs. Orderability
+
+Comparability and orderability can be understood as the "dual" concepts of equality and equivalence for range comparisons (rather than exact comparison). For the 2 values of the same type (except for NaN), comparability is stronger than orderability in the sense that everything that every order between two values that holds TRUE w.r.t. comparability also holds TRUE w.r.t. orderability, but not vice versa. Comparability is what is being used in range predicates. It is restricted to comparison within the same type or, for numerics, class of types; comparability is complete within a given type, but returns NULL if the two types are considered incomparable (e.g., an integer cannot be compared to a string). Orderability fills these gaps, by providing a stable sort order over mixed type results; it is consistent with comparability within a type, and complete both within and across types, i.e. it will never return NULL or throw an exception. +
+More details on comparability and orderability are sketched in the following two subsections, respectively.
+
+===== Comparability
+
+* Used by the comparison operators (https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L88[P.gt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L138[P.lt], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L117[P.gte], https://github.com/apache/tinkerpop/blob/050f66a956ae36ceede55613097cc86e19b8a737/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Compare.java#L168[P.lte]) in Gremlin and defines how to compare 2 values. +
+Example:
+
+[code]
+----
+// comparison over 2 property values
+gremlin> g.E().has("weight", gt(1))  
+----
+
+* For numbers,
+** it should be aligned to equality conceptually as far as type promotion is concerned. e.g. `1.0 < 2 < 3L`
+* Comparison should not result in undefined behavior, but can return NULL if and only if we are comparing incomparable data types. How this NULL result is handled is Graph provider dependent.
+* Otherwise Comparison does return TRUE or FALSE
+
+===== Orderability
+
+* Used to determine the order. In TinkerPop, the order step follows the notion of orderability.
+* Orderability must not result in NULL / undefined behavior.
+* Orderability must not throw an error. In other words, even if 2 values are incomparable we should still be able to determine the order of those two. This inevitably leads to the requirement to define the order across different data types. For the detailed order across types, see appendix.
+* Orderability determines if 2 values are ordered at the same position or one value is positioned earlier than another.
+* The concept of equivalence is used to determine if the 2 values are at the same position
+* When the position is identical, which value comes first (in other words, whether it should perform stable sort) depends on graph providers' implementation.
+* For values of the same type, comparability can be used to determine which comes first except for NaN in Number. For a different type, we have a dedicated order as described in the section below.
+
+===== Mapping table for TinkerPop operators
+
+Shown as below is a table for which notion proposed above each TinkerPop construct used.
+
+[%header]
+|================
+|Construct|Concept                
+|P.eq     |Equality               
+|P.neq    |Equality               
+|P.within |Equality               
+|P.without|Equality               
+|P.lt     |Comparability          
+|P.gt     |Comparability          
+|P.lte    |Equality, Comparability
+|P.gte    |Equality, Comparability
+|P.inside |Comparability          
+|P.outside|Comparability          
+|P.between|Equality, Comparability
+|================
+
+== What would change ?
+
+=== Semantics
+
+In terms of Semantics, right now TinkerPop does not have formal semantics to define these characteristics introduced in this proposal. Therefore this semantics should be published on the official TinkerPop doc.
+
+=== Behavioral changes
+==== Equality
+
+* NaN +
+JDK11 seems to produce a different error from JDK8 when it comes to BigDecimal comparisons that hit NaN and such. For JDK8 they seem to produce NumberFormatException but for JDK11 you get stuff like:
+
+[code]
+----
+gremlin> g.V().has("key", Float.NaN)
+Character N is neither a decimal digit number, decimal point, nor "e" notation exponential mark.
+----
+When Double / Float Number is stored, it always throws. With the proposed change, it wouldn't throw but because NaN is not equal to any numbers this returns empty result.
+
+* BigDecimal +
+Equality around BigDecimal and special values which cannot be parsed as Integer such as NaN, INF should not produce exceptions and should filter.
+
+[code]
+----
+gremlin> g.addV().property('key',Float.NaN)
+==>v[0]
+gremlin> g.addV().property('key',1.0f)
+==>v[2]
+gremlin> g.V().has('key',Float.NaN)
+==>v[0]
+gremlin> g.V().has('key',1.0f)
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0f)) // 3.5.x
+==>1.0
+gremlin> g.V().has('key',1.0) // 3.5.x - likely due to Groovy going to BigDecimal for "1.0"
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().values("key").is(eq(new BigDecimal(1.0f))) // 3.5.x
+java.lang.NumberFormatException
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().has('key',1.0) // proposed
+==>v[2]
+gremlin> g.V().values("key").is(eq(1.0)) // proposed
+==>1.0
+----
+
+==== Comparability
+
+* NaN +
+Comparing on NaN should return no results.
+
+[code]
+----
+gremlin> g.addV().property('key',-5)
+==>v[0]
+gremlin> g.addV().property('key',0)
+==>v[2]
+gremlin> g.addV().property('key',5)
+==>v[4]
+gremlin> g.addV().property('key',Double.NaN)
+==>v[6]
+gremlin> g.V().values("key").is(lte(Double.NaN)) // 3.5.x
+==>-5
+==>0
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // 3.5.x
+==>0
+==>5
+==>NaN
+gremlin> g.V().values("key").is(lt(Double.NaN)) // 3.5.x
+==>-5
+gremlin> g.V().values("key").is(gt(Double.NaN)) // 3.5.x
+==>5
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+==>NaN
+gremlin> g.V().values("key").is(lte(Double.NaN)) // proposed
+gremlin> g.V().values("key").is(gte(Double.NaN)) // proposed
+----
+
+* Comparability throws exception today but based on the proposal, it returns NULL when comparing incompatibile types.
+  ** When Vertex / Edge / VertexProperty  is compared, today it throws but it should return NULL.
+  ** When NULL is compared, today it throws an exception but it should return NULL. 
+
+==== Equivalence
+
+TinkerPop today uses a hash value for original values for grouping and the behavior is unchanged.
+
+==== Orderability
+
+- Currently, TinkerPop follows comparability for orderability, thus non-comparable and mixed-type values will fail in ordering. The proposed change is to be able to order any types.
+
+[code]
+----
+gremlin> g.V().order(). // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // 3.5.x
+org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerVertex cannot be cast to java.lang.Comparable
+Type ':help' or ':h' for help.
+Display stack trace? [yN]n
+gremlin> g.V().order()  // proposed
+==>v[1]
+==>v[2]
+==>v[3]
+==>v[4]
+==>v[5]
+==>v[6]
+gremlin> g.V(1).values('name').union(identity(),V(2)).order() // proposed
+==>v[2]
+==>marko
+gremlin> g.addV().property("key", 100)
+==>v[0]
+gremlin> g.addV().property("key", "100000")
+==>v[2]
+gremlin> g.V().values('key').order() // 3.5.x
+java.lang.Integer cannot be cast to java.lang.String
+Type ':help' or ':h' for help.
+Display stack trace? [yN]
+gremlin> g.V().values('key').order() // proposed
+==>100
+==>100000
+----
+
+== Open Questions
+
+* Should we take type-promotion into account in terms of equivalence ? +
+[code]
+----
+// In this case below,
+gremlin> g.V().property()
+==>[key:1.0]
+==>[key:1]
+
+// which is more natural, whether we don't de-duplicate them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+==>[key:1]
+
+// or de-dup them
+gremlin> g.V().property().dedup()
+==>[key:1.0]
+----
+        
+If de-duping, there is another question which value we should filter out. We need to define priority over types in Number. 
+Also note that TinkerPop is Java based and we have Double.NaN and Float.NaN, ±Double.INF and ±Float.INF. Not adhering type casting means, for example, Double.NaN and Float.NaN is not de-duplicated / grouped according to the semantics.
+        
+* Map.Entry is Java dependent type. Instead of defining semantics for Map.Entry, do we introduce a concept of like key-value tuple for it to generalize ?
+* Today we have Date type but don’t we need timezone aware DateTime type as well ?
+* Some graph providers may not support BigDecimal. Do we leave how TP deals with BigDecimal to Graph providers ?
+* Which should be more reasonable, NULL eq NULL is true or false ?

Review comment:
       Could you clarify what exactly you mean by the syntax "Nothing:optional"? And how it would technically work to not support NULLs in equality comparison? Are you talking about implementation or logical level?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#issuecomment-960632698


   merged to master under f7187958d8318bafab36a1da19517093b5540ad7^...11e8bb5bca33e5ee22a30ac38a7cc9478798bc14


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1487: Equality, Equivalence, Comparability, and Orderability Semantics prop…

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1487:
URL: https://github.com/apache/tinkerpop/pull/1487#issuecomment-951110110


   @rdtr  you're getting a failing the build smoke test because your file doesn't have the Apache license header. Just paste the following to the top of your document. 
   
   ```asciidoc
   ////
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
   
     http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   ////
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org