You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by s1ck <gi...@git.apache.org> on 2016/09/21 14:11:23 UTC

[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

GitHub user s1ck opened a pull request:

    https://github.com/apache/flink/pull/2527

    [FLINK-4624] Allow for null values in Graph Summarization

    * Bug was caused by serializers that cannot handle null values (e.g. Long)
    * VertexGroupItem now uses Either<NullValue, VV> instead of VV
    * Generalized test cases
    * Added tests for vertex/edge values of type Long

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/s1ck/flink FLINK-4624

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2527
    
----
commit e6db894d6b84cf95206905a6f1a6713f78e32988
Author: Martin Junghanns <ma...@gmx.net>
Date:   2016-09-21T11:31:41Z

    [FLINK-4624] Support null values in Graph Summarization
    
    * Bug was caused by serializers that cannot handle null values (e.g. Long)
    * VertexGroupItem now uses Either<NullValue, VV> instead of VV
    * Generalized test cases
    * Added tests for vertex/edge values of type Long

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by s1ck <gi...@git.apache.org>.
Github user s1ck commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    @greghogan since I addressed all your comments, is there anything left to do that prevents this from merging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by s1ck <gi...@git.apache.org>.
Github user s1ck commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    > I'd also like to add a ToNullValue translator that would accept any type and convert to NullValue.
    
    I don't know if I get this right. Do you mean an additional test case with `NullValue` as vertex/edge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    Would a graph translator simplify the conversion from Long to String? You can do `graph.run(new TranslateEdgeValues<...>(new StringToLong())` and write a simple `public class StringToLong implements TranslateFunction<String, Long> { ...`. Or this could be an anonymous class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2527


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    `ToNullValue` is now committed in a new `translator` package in Gelly. I think that one will be generally very useful.
    
    I created FLINK-4673 so we can look at fixing the type extraction with the Either type. Using the recently added `TypeInfoFactory` it was quite simple and I will create the PR after tests pass (I also removed the explicit code from `TypeExtractor`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    Will merge today. I'll add a note that the `TypeInformation` will be removed once we resolve the issues typing `Either`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    I'd also like to add a `ToNullValue` translator that would accept any type and convert to `NullValue`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by s1ck <gi...@git.apache.org>.
Github user s1ck commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    > I'm not following why specifying the TypeInformation is now required with the change to using Either. Is the type system failing to handle this properly?
    You are right, I was experimenting with return type definitions for the groupReduce and the map call. However, for the map call in `Summarization.java:126` the return type definition is required or the job fails as it cannot determine the type for `VV`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    I'm not following why specifying the `TypeInformation` is now required with the change to using `Either`. Is the type system failing to handle this properly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

Posted by s1ck <gi...@git.apache.org>.
Github user s1ck commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2527#discussion_r80186110
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/Summarization.java ---
    @@ -226,11 +247,15 @@ public void setGroupRepresentativeId(K groupRepresentativeId) {
     		}
     
     		public VGV getVertexGroupValue() {
    -			return f2;
    +			return f2.isLeft() ? f2.left() : null;
     		}
     
     		public void setVertexGroupValue(VGV vertexGroupValue) {
    -			f2 = vertexGroupValue;
    +			if (vertexGroupValue == null) {
    +				f2 = new Either.Right<>(NullValue.getInstance());
    --- End diff --
    
    An instance of `VertexGroupItem` is reused in the `VertexGroupReducer`. Here the setter is implicitly only called once in the open method. So I thought reusing the new Right<>(NullValue.getInstance()) wouldn't be a benefit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2527#discussion_r80109825
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/Summarization.java ---
    @@ -226,11 +247,15 @@ public void setGroupRepresentativeId(K groupRepresentativeId) {
     		}
     
     		public VGV getVertexGroupValue() {
    -			return f2;
    +			return f2.isLeft() ? f2.left() : null;
     		}
     
     		public void setVertexGroupValue(VGV vertexGroupValue) {
    -			f2 = vertexGroupValue;
    +			if (vertexGroupValue == null) {
    +				f2 = new Either.Right<>(NullValue.getInstance());
    --- End diff --
    
    Can these objects be reused?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2527#discussion_r80309626
  
    --- Diff: flink-libraries/flink-gelly-examples/src/test/java/org/apache/flink/graph/library/SummarizationITCase.java ---
    @@ -100,23 +102,41 @@ public void testWithVertexAndAbsentEdgeValues() throws Exception {
     		validateEdges(SummarizationData.EXPECTED_EDGES_ABSENT_VALUES, summarizedEdges);
     	}
     
    -	private void validateVertices(String[] expectedVertices,
    -																List<Vertex<Long, Summarization.VertexValue<String>>> actualVertices) {
    +	@Test
    +	public void testWithVertexAndEdgeLongValues() throws Exception {
    +		ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +
    +		Graph<Long, Long, Long> input = Graph.fromDataSet(
    +				SummarizationData.getVertices(env),
    +				SummarizationData.getEdges(env),
    +				env)
    +			.run(new TranslateVertexValues<Long, String, Long, String>(new StringToLong()))
    +			.run(new TranslateEdgeValues<Long, Long, String, Long>(new StringToLong()));
    +
    +		List<Vertex<Long, Summarization.VertexValue<Long>>> summarizedVertices = Lists.newArrayList();
    --- End diff --
    
    Can replace Guava's `Lists.newArrayList()` with `new ArrayList<>()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2527: [FLINK-4624] Allow for null values in Graph Summarization

Posted by s1ck <gi...@git.apache.org>.
Github user s1ck commented on the issue:

    https://github.com/apache/flink/pull/2527
  
    > Would a graph translator simplify the conversion from Long to String? You can do graph.run(new TranslateEdgeValues<...>(new StringToLong()) and write a simple public class StringToLong implements TranslateFunction<String, Long> { .... Or this could be an anonymous class.
    
    I was not aware of this function / interface. I just needed it to use the same data for String and Long tests. I will update the conversion to your suggested method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2527: [FLINK-4624] Allow for null values in Graph Summar...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2527#discussion_r80311778
  
    --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/Summarization.java ---
    @@ -226,11 +247,15 @@ public void setGroupRepresentativeId(K groupRepresentativeId) {
     		}
     
     		public VGV getVertexGroupValue() {
    -			return f2;
    +			return f2.isLeft() ? f2.left() : null;
     		}
     
     		public void setVertexGroupValue(VGV vertexGroupValue) {
    -			f2 = vertexGroupValue;
    +			if (vertexGroupValue == null) {
    +				f2 = new Either.Right<>(NullValue.getInstance());
    --- End diff --
    
    I am tracing `VertexGroupReducer.reduce` calling `createGroupRepresentativeTuple` which calls `setVertexGroupValue` which creates a new `Either.Left` or `Either.Right`. Then `VertexGroupReducer.reduce` calls `reset` which creates a new `Either.Right`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---