You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2021/10/09 06:14:30 UTC

[GitHub] [tinkerpop] amatiushkin opened a new pull request #1485: TINKERPOP-848 draft implementation to support default value

amatiushkin opened a new pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485


   Draft implementation, which adds support for default values in GraphML ([TINKERPOP-848](https://issues.apache.org/jira/browse/TINKERPOP-848)).
   
   Support feature is incubating, due to following capabilities mentioned in GraphML [specification](http://graphml.graphdrawing.org/specification.html).
   
   
   ```
   <xs:complexType name="default.type" final="#all">
   <xs:annotation>
   <xs:documentation source="http://graphml.graphdrawing.org/" xml:lang="en">
   Complex type for the <default> element. default.type is mixed, that is, data may contain #PCDATA. Content type: extension of data-extension.type which is empty per default.
   </xs:documentation>
   </xs:annotation>
   <xs:complexContent mixed="true">
   <xs:extension base="data-extension.type">
   <xs:attributeGroup ref="default.extra.attrib">
   <xs:annotation>
   <xs:documentation source="http://graphml.graphdrawing.org/" xml:lang="en">
   user defined extra attributes for <default> elements
   </xs:documentation>
   </xs:annotation>
   </xs:attributeGroup>
   </xs:extension>
   </xs:complexContent>
   </xs:complexType>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r732331858



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       @spmallette 
   My goal is to deliver practical and useful feature, so my choice is to implement whatever is necessary to import GraphML with default values.
   
   If implementation needs support multiple ways, that's fine. I hope we can prepare list of test cases & requirements which needs to be supported.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin closed pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin closed pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r726105358



##########
File path: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReaderTest.java
##########
@@ -0,0 +1,62 @@
+package org.apache.tinkerpop.gremlin.structure.io.graphml;
+
+import junit.framework.TestCase;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import static org.apache.tinkerpop.gremlin.structure.T.label;
+import static org.apache.tinkerpop.gremlin.structure.T.id;
+
+
+import java.io.IOException;
+import java.io.InputStream;
+
+
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+import static org.mockito.Mockito.any;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.atLeastOnce;
+
+public class GraphMLReaderTest extends TestCase {

Review comment:
       Good use of mocking here. Two things:
   
   1. I imagine the rat plugin is failing for this build because you don't have the Apache license header in this new file.
   2. Please prefer JUnit annotations over the use of `TestCase` - You may refer to any of the other unit tests in the code for examples of that style.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727717241



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would generate various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe for this method, e.g. it always returns string. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727961817



##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       > Does it make sense to make gremlin-test component to be a test dependency for grenmlin-core? I am not a big fun of duplicating things (including test),
   
   I agree...we have a fair bit of duplication unfortunately and it drives me crazy but there is at least some reason for it. Your instinct is right in that it seems like `gremlin-core` should depend on `gremlin-test` but that would create a circular dependency. `gremlin-test` actually depends on `gremlin-core` currently because it is the technology test kit that providers use to validate their implementations. we technically need a shared base test module for a small handful of things that could be shared with `gremlin-test`, `gremlin-core` (and below that `gremlin-language`). I've resisted adding that because the handful of things is just small enough to make adding a new module unappetizing and as you say, adds a bit more complexity to things.
   
   There are some things happening now with the refactoring of the technology test kit that might make a restructuring more necessary and appealing, but for the immediate term I'd say that we're a bit stuck with some duplication here/there. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r730338605



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       I need to better understand the intention to move default value assignment under `XMLEvent.END_ELEMENT`.
   It would require to make an extra call to `typeCastValue(...)` for default values as well before making an assignment.  I also might need to create new loop or re-use stream step to map default values to a key. It also looks to me that value initialization and coercion is nicely located as part of  `XMLEvent.START_ELEMENT` . DATA case.
   
   What's the benefit of decoupling default value processing from element properties construction?
   
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r732167785



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       After some more thought, it seems that perhaps we might be both right in our respective thinking on how this should work. There is enough ambiguity to warrant either approach it seems and if we use Gephi as our model (which @krlawrence was good enough to test), then we find that they have implemented it both your way and mine. I think we should follow suit and implement it both ways as well.
   
   From your perspective, @amatiushkin you may choose to leave your implementation as it is in this PR and we can look to review/merge where it stands. If you take this approach, we can just create another JIRA to have it work the way I had interpreted the specification. Alternatively, you can implement both approaches on this pull request to match the functionality as Gephi has it.
   
   Please let us know how you intend to proceed. Thanks for your patience on this by the way. I would not have guessed this issue to generate as much commenting as it has. 🙂 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r730894525



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       > I need to better understand the intention to move default value assignment under XMLEvent.END_ELEMENT.
   
   Maybe i'm missing something but as I understand things, if "there is no default value therefore related events won't be emitted at all" then your code will only add a default value if the key is present with an empty string. If that is correct, that leads to two issues: (1) that means user will be precluded from using empty string as a valid value if they choose to use defaults. (2) I don't think that this is the intention of the defaults in GraphML. Section 2.4.3 of the GraphML Primer explains the expected usage that I tried to describe in my last comment:
   
   http://graphml.graphdrawing.org/primer/graphml-primer.html#AttributesValues
   
   Perhaps their explanation will clarify it for you. 
   
   > It would require to make an extra call to typeCastValue(...) for default values as well before making an assignment.
   
   Is there a reason you couldn't cache these defaults when they are first parsed at `GraphMLTokens.DEFAULT`?  Instead of storing element text, just call `typeCastValue()` there and then in `END_ELEMENT` detect which keys are not present and inject the defaults. wdyt?
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r731503254



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       The primer says:
   
   > There can be graph elements for which a GraphML-Attribute is defined but no value is declared by a corresponding data element. If a default value is defined for this GraphML-Attribute, then this default value is applied to the graph element.
   
   
   XML specification (3.1[ Start-Tags, End-Tags, and Empty-Element Tags](https://www.w3.org/TR/xml/#sec-starttags)) defines what's considered content and no-content.
   
   No content case is defined as ` The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.` 
   
   Example data tags with self-closing tag:
   ```
   <data key="abc" /> 
   ```
   and empty-tag
   ```
   <data key="abc"></data> 
   ```
   Therefore, XMLStreamReader returns empty string for such tags (no-content). It is empty string and not `null`.
   
   So, yes. 
   **User is precluded from using empty string and default value at the same time, unless default value is empty string.**
   
   I have added more tests to illustrate it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727719622



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       This does not look right to me, key should be `d_n`. There is no key with id="modification", therefore it should be an invalid XML document to begin with.
   ```xml
   <node id="6315">
     <data key="modification"/>
   </node>
   ```
   
   That said, I will add this use cases as well just to see how it would perform.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727718369



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   ⚠️ My implementation will generate **NPE** in such cases, because of `defaultValues.get(key)` && `defaultValue.length()`. 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r726113446



##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       I can see the need for a new test file as you are adding a new feature here and I don't really think we should change the existing GraphML files. As a suggestion, it would be nice if the new test file simply was a modification of the existing "modern" data set. We try to keep the datasets consistent when we can even for test. It's not always possible but in this case I think it would be easy to copy the tinkerpop-modern.xml file and modify it to include default values. An easy one would be to include defaults for "lang" and/or "age" which aren't shared across all vertex labels. For edge defaults I suppose you would have to fabricate something as all edges have "weight" but at least the base graph data would be the same. if you have some good reason for sticking with this data as you have though, please let me know, otherwise it would be nice to see our more familiar dataset in use here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727955226



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       > This does not look right to me, key should be d_n.
   
   sorry - i guess i'm used to making id/attr.name/key the same. didn't mean to be confusing in my comment. My point was more that `<data key="d_n"/>` should not trigger use of the default. It should simply be empty string. 
   
   The case for default usage is when the there is no `<data>` element with `key="d_n"` at all. In that case it should inject the property "modification=add".  You mention that this situation currently leads to:
   
   > there is no default value therefore related events won't be emitted at all....My implementation will generate NPE in such cases
   
   I think the goal here is to determine if the property key is present or not in the GraphML for the particular node/edge and then adding it if its not. In that sense, I think that the bulk of your changes for this should neatly fit into `XMLEvent.END_ELEMENT` where all the property keys have already been collected and you just before the vertex is created. Does that all make sense?

##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       > Does it make sense to make gremlin-test component to be a test dependency for grenmlin-core? I am not a big fun of duplicating things (including test),
   
   I agree...we have a fair bit of duplication unfortunately and it drives me crazy but there is at least some reason for it. Your instinct is right in that it seems like `gremlin-core` should depend on `gremlin-test` but that would create a circular dependency. `gremlin-test` actually depends on `gremlin-core` currently because it is the technology test kit that providers use to validate their implementations. we technically need a shared base test module for a small handful of things that could be shared with `gremlin-test`, `gremlin-core` (and below that `gremlin-language`). I've resisted adding that because the handful of things is just small enough to make adding a new module unappetizing and as you say, adds a bit more complexity to things.
   
   There are some things happening now with the refactoring of the technology test kit that might make a restructuring more necessary and appealing, but for the immediate term I'd say that we're a bit stuck with some duplication here/there. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence edited a comment on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
krlawrence edited a comment on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-947604873


   I think part of the challenge here is the GraphML spec. There is solid normative guidance for the allowed schema but there seems to be very little in the way of normative guidance regarding the processing model. The Primer itself is not normative and does not cover all possible cases where the schema allows a `<default>` value. The ambiguity comes from a document such as this one:
   ```
   <graphml xmlns='http://graphml.graphdrawing.org/xmlns'>
     <key id='code'    for='node' attr.name='code'    attr.type='string'><default>LHR</default></key>
     <key id='city'    for='node' attr.name='city'    attr.type='string'><default>London></default></key>
     <graph id='routes' edgedefault='directed'>
       <node id='0' label='n1'>
         <data key='city'/>
       </node>
     </graph>
    </graphml>  
   ```
   When `n1` is created should it have both properties for `code` and `city` created using the default values? The spec provides no guidance (that I can find). I tested using Gephi which is a popular tool for working with various graph file formats. The results from Gephi yield a node with both properties. In the absence of normative guidance from the GraphML spec itself I would propose we be consistent with Gephi if we add support for default values to help with interchange between other tools, like Gephi, and TinkerPop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r731510822



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       > Is there a reason you couldn't cache these defaults when they are first parsed at GraphMLTokens.DEFAULT?
   
    Empty tag and self-closing tag will cause exception for any type except string, see test **noDefaultIntValue**.
    Consider key of integer type: `<key id="age" for="node" attr.name="age" attr.type="int">`
   ```java
     case GraphMLTokens.DATA:
     //...
          String value =  reader.getElementText();  // will return empty string
    //...
          typeCastValue(key, value, keyTypesMaps) // will generate exception, if key type is integer
   ```
    
   So,if you want me to move parsing for default values under `END_ELEMENT`, I have to move parsing of none-default values as well.
   
   I don't mind doing it, but this refactoring would make sense with other change, e.g. to use push-down automata instead of holding individual cached values to preserve parsing context.
   
   So far I am trying to minimize my changes and reduce impact radius.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r732167785



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       After some more thought, it seems that perhaps we might be both right in our respective thinking on how this should work. There is enough ambiguity to warrant either approach it seems and if we use Gephi as our model (which @krlawrence was good enough to test), then we find that they have implemented it both your way and mine. I think we should follow suit and implement it both ways as well.
   
   From your perspective, @amatiushkin you may choose to leave your implementation as it is in this PR and we can look to review/merge where it stands. If you take this approach, we can just create another JIRA to have it also work the way I had interpreted the specification. Alternatively, you can implement both approaches on this pull request to match the functionality as Gephi has it.
   
   Please let us know how you intend to proceed. Thanks for your patience on this by the way. I would not have guessed this issue to generate as much commenting as it has. 🙂 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-995837372


   @amatiushkin just thought i'd check in on this PR. any chance you plan on coming back to this? we are planning a release with code freeze aimed at december 31st. just thought i'd let you know in case you'd like to see this change in place for that release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727722690



##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       re: tinkerpop-modern.xml
   
   I also plan to add edge case with CDATA/PDATA once I found any good example.
   
   Does it make sense to make `gremlin-test` component to be a test dependency for `grenmlin-core`? 
   
   I am not a big fun of duplicating things (including test), because it makes future contributor to be aware to update modern dataset in few more places. 
   
   On the other hand, test dependency would add a little bit complexity to the build process itself.
   
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-945036608


   @spmallette shall I worry about supporting user extensions in scope of this PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r726101651



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       I don't recall how `getElementText()` works. If you have `<data key="d_n"/>` does that return an empty string or `null`? If it's the former, which your code seems to indicate, doesn't this mean that GraphML can't have empty strings as values anymore (which may be what the user wants)? I think the point of the default specification is to deal with a situation like:
   
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default>add</default>
   </key>
   ...
   <node id="6315"></node>
   ```
   
   where the "modification" is not present at all for the `<node><data>` in which case it would include the default property of "modification" with an "add" value. otoh, if it were the following:
   
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default>add</default>
   </key>
   ...
   <node id="6315">
     <data key="modification"/>
   </node>
   ```
   
   it would just include the "modification" property with an empty string. Does that make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727722690



##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       re: tinkerpop-modern.xml
   
   I also plan to add edge case with CDATA/PDATA once I find any good example.
   
   Does it make sense to make `gremlin-test` component to be a test dependency for `grenmlin-core`? 
   
   I am not a big fun of duplicating things (including test), because it makes future contributor to be aware to update modern dataset in few more places. 
   
   On the other hand, test dependency would add a little bit complexity to the build process itself.
   
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727717241



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would generate various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727717241



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would throw various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-940028613


   Thanks for this pull request. I've added some comments for you to consider, but generally speaking it is on the right track. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727717241



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would through various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   My implementation could generate NPE in such cases, because (`defaultValues.get(key)` && `defaultValue.length()`). 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   My implementation could generate NPE in such cases, because (`defaultValues.get(key)` && `defaultValue.length()`). 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       This does not look right to me, key should be `d_n`. There is no key with id="modification", therefore it should be an invalid XML document to begin with.
   ```xml
   <node id="6315">
     <data key="modification"/>
   </node>
   ```
   
   That said, I will add this use cases as well just to see how it would perform.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   My implementation will generate **NPE** in such cases, because of `defaultValues.get(key)` && `defaultValue.length()`. 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   ⚠️ My implementation will generate **NPE** in such cases, because of `defaultValues.get(key)` && `defaultValue.length()`. 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.

##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       re: tinkerpop-modern.xml
   
   I also plan to add edge case with CDATA/PDATA once I found any good example.
   
   Does it make sense to make `gremlin-test` component to be a test dependency for `grenmlin-core`? 
   
   I am not a big fun of duplicating things (including test), because it makes future contributor to be aware to update modern dataset in few more places. 
   
   On the other hand, test dependency would add a little bit complexity to the build process itself.
   
   WDYT?

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would throw various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would generate various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would generate various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe for this method, e.g. it always returns string. 

##########
File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml
##########
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>

Review comment:
       re: tinkerpop-modern.xml
   
   I also plan to add edge case with CDATA/PDATA once I find any good example.
   
   Does it make sense to make `gremlin-test` component to be a test dependency for `grenmlin-core`? 
   
   I am not a big fun of duplicating things (including test), because it makes future contributor to be aware to update modern dataset in few more places. 
   
   On the other hand, test dependency would add a little bit complexity to the build process itself.
   
   WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727718369



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   My implementation could generate NPE in such cases, because (`defaultValues.get(key)` && `defaultValue.length()`). 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r732359730



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       Let me ping you offline, @spmallette. I think I am missing some part which prevents me to look on specs from your point.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r731761779



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       When I read the sentence you quoted:
   
   > There can be graph elements for which a GraphML-Attribute is defined but no value is declared by a corresponding data element. If a default value is defined for this GraphML-Attribute, then this default value is applied to the graph element.
   
   I can see how you might read it the way that you are, but I sense it was just loosely written. To get started, I'm fairly certain that the "GraphML-Attribute" referred to here is not `<data>`, but instead refers to `<node>` or `<edge>`.  The illustrative example that follows immediately after that sentence clarifies the behavior:
   
   > In the above example no value is defined for the node with identifier n1 and the GraphML-Attribute with name color.
   
   where "n1" refers to:
   
   ```xml
   <key id="d0" for="node" attr.name="color" attr.type="string">
     <default>yellow</default>
   </key>
   ...
   <node id="n1"/>
   ```
   
   and:
   
   > Therefore this GraphML-Attribute has the default value, yellow for this node.
   
   The section finishes by discussing how a property value is left undefined, which fits this model as well. Taken all together, I think the behavior described by the example is the intended one, no?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
krlawrence commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-947604873


   I think part of the challenge here is the GraphML spec. There is solid normative guidance for the allowed schema but there seems to be very little in the way of normative guidance regarding the processing model. The Primer itself is not normative and does not cover all possible cases where the schema allows a `<default>` value. The ambiguity comes from a document such as this one:
   ```
   <graphml xmlns='http://graphml.graphdrawing.org/xmlns'>
     <key id='code'    for='node' attr.name='code'    attr.type='string'><default>LHR</default></key>
     <key id='city'    for='node' attr.name='city'    attr.type='string'><default>London></default></key>
     <graph id='routes' edgedefault='directed'>
       <node id='0' label='n1'>
         <data key='city'/>
       </node>
     </graph>
    </graphml>  
   ```
   When `n1` is created should it have both properties for `code` and `city` created? The spec provides no guidance (that I can find). I tested using Gephi which is a popular tool for working with various graph file formats. The results from Gephi yield a node with both properties. In the absence of normative guidance from the GraphML spec itself I would propose we be consistent with Gephi if we add support for default values to help with interchange between other tools, like Gephi, and TinkerPop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-950378846


   Ok. I think it make sense to introduce different strategies for importing default values. 
   
   Given the impact of this change, I would propose following strategies:
   - default (interim) — defaults are not imported at all, similar to current implementation. 
   - Gephi-like — default are imported like in Gephi (all defaults are assigned, unless overwritten)
   - explicit — only exact defaults are assigned (empty, self-closing nodes are ignored)
   - soft — empty and self-closing nodes are overwriting default values if possible
   - custom — user defines behavior as combination of above
   
   The list of exact features is not fully know to me, but I can highlight few things:
   - default value has been assigned to a node/edge 
   - node/edge has data element which refers to a key with default value
   - data element in node/edge defines default value
   
   These features needs to be listed in a specification-like fashion in  the documentation. I will propose draft, since I am busy with this anyway, but I absolutely need to collect your inputs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-950876459


   If you'd like to add different configuration options to the `Builder` to control how defaults are handled, I suppose that you can. Personally, I'd be content with just adding the Gephi-style processing as @krlawrence described and just leave it at that. If someone wants something more complex later, they may request it or add it themselves. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence edited a comment on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
krlawrence edited a comment on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-947604873


   I think part of the challenge here is the GraphML spec. There is solid normative guidance for the allowed schema but there seems to be very little in the way of normative guidance regarding the processing model. The Primer itself is not normative and does not cover all possible cases where the schema allows a `<default>` value. The ambiguity comes from a document such as this one:
   ```
   <graphml xmlns='http://graphml.graphdrawing.org/xmlns'>
     <key id='code'    for='node' attr.name='code'    attr.type='string'><default>LHR</default></key>
     <key id='city'    for='node' attr.name='city'    attr.type='string'><default>London></default></key>
     <graph id='routes' edgedefault='directed'>
       <node id='0' label='n1'>
         <data key='city'/>
       </node>
     </graph>
    </graphml>  
   ```
   When `n1` is created should it have both properties for `code` and `city` created using the default values? The spec provides no guidance (that I can find). I tested using Gephi which is a popular tool for working with various graph file formats. The results from Gephi yield a node with both properties. In the absence of normative guidance from the GraphML spec itself I would propose we be consistent with Gephi if we add support for default values to help with interchange between other tools, like Gephi, and TinkerPop. Given that a document can override the global default locally should it need to, to really generate a node with an empty string, that would seem to allow for all cases.
   ```
   <graphml xmlns='http://graphml.graphdrawing.org/xmlns'>
     <key id='code'    for='node' attr.name='code'    attr.type='string'><default>LHR</default></key>
     <key id='city'    for='node' attr.name='city'    attr.type='string'><default>London></default></key>
     <graph id='routes' edgedefault='directed'>
       <node id='0' label='n1'>
         <data key='city'> <default></default></data>
       </node>
     </graph>
    </graphml>    
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence edited a comment on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
krlawrence edited a comment on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-947604873


   I think part of the challenge here is the GraphML spec. There is solid normative guidance for the allowed schema but there seems to be very little in the way of normative guidance regarding the processing model. The Primer itself is not normative and does not cover all possible cases where the schema allows a `<default>` value. The ambiguity comes from a document such as this one:
   ```
   <graphml xmlns='http://graphml.graphdrawing.org/xmlns'>
     <key id='code'    for='node' attr.name='code'    attr.type='string'><default>LHR</default></key>
     <key id='city'    for='node' attr.name='city'    attr.type='string'><default>London></default></key>
     <graph id='routes' edgedefault='directed'>
       <node id='0' label='n1'>
         <data key='city'/>
       </node>
     </graph>
    </graphml>  
   ```
   When `n1` is created should it have both properties for `code` and `city` created using the default values? The spec provides no guidance (that I can find). I tested using Gephi which is a popular tool for working with various graph file formats. The results from Gephi yield a node with both properties using the default values. In the absence of normative guidance from the GraphML spec itself I would propose we be consistent with Gephi if we add support for default values to help with interchange between other tools, like Gephi, and TinkerPop. Given that a document can override the global default locally should it need to, to really generate a node with an empty string, that would seem to allow for all cases.
   ```
   <graphml xmlns='http://graphml.graphdrawing.org/xmlns'>
     <key id='code'    for='node' attr.name='code'    attr.type='string'><default>LHR</default></key>
     <key id='city'    for='node' attr.name='city'    attr.type='string'><default>London></default></key>
     <graph id='routes' edgedefault='directed'>
       <node id='0' label='n1'>
         <data key='city'> <default></default></data>
       </node>
     </graph>
    </graphml>    
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727718369



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   My implementation will generate **NPE** in such cases, because of `defaultValues.get(key)` && `defaultValue.length()`. 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727955226



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       > This does not look right to me, key should be d_n.
   
   sorry - i guess i'm used to making id/attr.name/key the same. didn't mean to be confusing in my comment. My point was more that `<data key="d_n"/>` should not trigger use of the default. It should simply be empty string. 
   
   The case for default usage is when the there is no `<data>` element with `key="d_n"` at all. In that case it should inject the property "modification=add".  You mention that this situation currently leads to:
   
   > there is no default value therefore related events won't be emitted at all....My implementation will generate NPE in such cases
   
   I think the goal here is to determine if the property key is present or not in the GraphML for the particular node/edge and then adding it if its not. In that sense, I think that the bulk of your changes for this should neatly fit into `XMLEvent.END_ELEMENT` where all the property keys have already been collected and you just before the vertex is created. Does that all make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727717241



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would through various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on a change in pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on a change in pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727718369



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java
##########
@@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri
                         case GraphMLTokens.DATA:
                             final String key = reader.getAttributeValue(null, GraphMLTokens.KEY);
                             final String dataAttributeName = keyIdMap.get(key);
+                            final String defaultValue = defaultValues.get(key);
 
                             if (dataAttributeName != null) {
-                                final String value = reader.getElementText();
+                                String elementValue =  reader.getElementText();
+                                final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue;

Review comment:
       For cases without value like below, there is no `default` value therefore related events won't be emitted at all.
   ```xml
   <data key="d_n"/>
   ```
   My implementation could generate NPE in such cases, because (`defaultValues.get(key)` && `defaultValue.length()`). 
   
   I'll add tests for that and fix it.
   
   Btw, another use case with default tag alone without value should give _empty string_ by default.
   ```xml
   <key id="d_n" for="node" attr.name="modification" attr.type="string">
     <default/>
   </key>
   ``` 
   
   I'll add it as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] amatiushkin commented on pull request #1485: TINKERPOP-848 draft implementation to support default value

Posted by GitBox <gi...@apache.org>.
amatiushkin commented on pull request #1485:
URL: https://github.com/apache/tinkerpop/pull/1485#issuecomment-1001710342


   @spmallette Thanks for the ping. I'll close this PR for now and re-open once it is ready for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org