You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by ramkrish86 <gi...@git.apache.org> on 2017/05/12 08:21:20 UTC

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

GitHub user ramkrish86 opened a pull request:

    https://github.com/apache/flink/pull/3881

    FLINK-6284 Incorrect sorting of completed checkpoints in ZooKeeperCompletedCheckpointStore

    ZooKeeperCompletedCheckpointStore
    
    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [ ] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [ ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [ ] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed
    
    
    Making use of the Zookeeper's getChildren() API directly so that it just creates a list in the sequence order. If we go with the ZKPaths API then we need to do some sorting by converting the List<STring> to List<Long>.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ramkrish86/flink FLINK-6284

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3881
    
----
commit 33bf37a2d706af6c8eb6cbe9d58aa3ac9d1f03e0
Author: Ramkrishna <ra...@intel.com>
Date:   2017-05-12T08:18:16Z

    FLINK-6284 Incorrect sorting of completed checkpoints in
    ZooKeeperCompletedCheckpointStore

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 closed the pull request at:

    https://github.com/apache/flink/pull/3881


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116185935
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java ---
    @@ -346,11 +346,7 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    +				List<String> children = client.getZookeeperClient().getZooKeeper().getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    --- End diff --
    
    I think this alone does not work: The JavaDocs of `ZooKeeper#getChildren` say
    
    > The list of children returned is not sorted and no guarantee is provided as to its natural or lexical order.
    
    Thus, I assume that it is not safe to simply return the list of children without any further processing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116211132
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java ---
    @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    -				for (String path : children) {
    -					path = "/" + path;
    +				List<String> childrenInStr =
    +					client.getZookeeperClient().getZooKeeper().
    +						getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    +				List<Long> children = new ArrayList<Long>(childrenInStr.size());
    +				for(String childNode : childrenInStr) {
    +					children.add(new Long(childNode));
    +				}
    --- End diff --
    
    Here again. It is my bad. I lost my previous changes becauseo f the compile issue. So lost this. I have made a new push already for this sort thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3881: FLINK-6284 Incorrect sorting of completed checkpoints in ...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on the issue:

    https://github.com/apache/flink/pull/3881
  
    I won't be available for next 2 to 3 hours. So feel free to decide based on  your convenience in case you need to make the RC candidate for 1.3 release. I am sorry that I could not make an initial commit that took care of things properly, should have been more careful.  Thanks for the opportunity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116188234
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java ---
    @@ -346,11 +346,7 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    +				List<String> children = client.getZookeeperClient().getZooKeeper().getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    --- End diff --
    
    Let me do it my older way. I had a patch but I thought this is better. I checked the javadoc of the ZKPaths only. I will push my initial version of the patch only then, where convert the List<String> to List<Long> and then use that as the sorted one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116207072
  
    --- Diff: pom.xml ---
    @@ -101,7 +101,8 @@ under the License.
     		<chill.version>0.7.4</chill.version>
     		<asm.version>5.0.4</asm.version>
     		<zookeeper.version>3.4.6</zookeeper.version>
    -		<curator.version>2.12.0</curator.version>
    +		<curator.version>2.11.0</curator.version>
    --- End diff --
    
    Why are you downgrading the curator version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3881: FLINK-6284 Incorrect sorting of completed checkpoints in ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/3881
  
    Hi @ramkrish86, I might have found an easy way to solve the problem. Take a look at https://github.com/tillrohrmann/flink/commit/5bd499329d68c6f3236b4e89ba25fdb9acb7e422. If this solves the problem, then I would open a PR with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116211254
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java ---
    @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    -				for (String path : children) {
    -					path = "/" + path;
    +				List<String> childrenInStr =
    +					client.getZookeeperClient().getZooKeeper().
    +						getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    +				List<Long> children = new ArrayList<Long>(childrenInStr.size());
    +				for(String childNode : childrenInStr) {
    +					children.add(new Long(childNode));
    --- End diff --
    
    Ok. I see. I am not sure on this MesosWorker. Using cxid am not sure if we have an API. If so we can direclty use it. Will be back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116208844
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java ---
    @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    -				for (String path : children) {
    -					path = "/" + path;
    +				List<String> childrenInStr =
    +					client.getZookeeperClient().getZooKeeper().
    +						getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    +				List<Long> children = new ArrayList<Long>(childrenInStr.size());
    +				for(String childNode : childrenInStr) {
    +					children.add(new Long(childNode));
    --- End diff --
    
    I'm not sure whether we can assume that the children paths are always longs. In the general case this is not true (see `ZooKeeperMesosWorkerStore`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3881: FLINK-6284 Incorrect sorting of completed checkpoints in ...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on the issue:

    https://github.com/apache/flink/pull/3881
  
    @tillrohrmann 
    Thanks for the new PR. I just executed your change with 101, 99 , 100 as the checkpoint order. In this case 100 should be the latest one though the actual ids are not sorted. But with your change and my earlier commit it will always sort 99, 100, 101.
    Can you take a look at my latest commit, that is based on czxid (as per your suggestion) and I think that makes sense. What ever be the actual id, in the zookeeper what was created recently will be the latest checkpoint. But am not very sure if the checkpointId will really be added in a  non-sorted way and can 100 be the latest one (though 101 was also there). 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116207779
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java ---
    @@ -346,17 +346,20 @@ public int exists(String pathInZooKeeper) throws Exception {
     			} else {
     				// Initial cVersion (number of changes to the children of this node)
     				int initialCVersion = stat.getCversion();
    -
    -				List<String> children = ZKPaths.getSortedChildren(
    -						client.getZookeeperClient().getZooKeeper(),
    -						ZKPaths.fixForNamespace(client.getNamespace(), "/"));
    -
    -				for (String path : children) {
    -					path = "/" + path;
    +				List<String> childrenInStr =
    +					client.getZookeeperClient().getZooKeeper().
    +						getChildren(ZKPaths.fixForNamespace(client.getNamespace(), "/"), false);
    +				List<Long> children = new ArrayList<Long>(childrenInStr.size());
    +				for(String childNode : childrenInStr) {
    +					children.add(new Long(childNode));
    +				}
    --- End diff --
    
    Where are the children sorted? Again, I think this only works because `ZooKeeper#getChildren` returns the nodes in the right order in the test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3881: FLINK-6284 Incorrect sorting of completed checkpoi...

Posted by ramkrish86 <gi...@git.apache.org>.
Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3881#discussion_r116211032
  
    --- Diff: pom.xml ---
    @@ -101,7 +101,8 @@ under the License.
     		<chill.version>0.7.4</chill.version>
     		<asm.version>5.0.4</asm.version>
     		<zookeeper.version>3.4.6</zookeeper.version>
    -		<curator.version>2.12.0</curator.version>
    +		<curator.version>2.11.0</curator.version>
    --- End diff --
    
    Oh..My environment was not able to get 2.12.0. So to make things compile I included this change. Will revert it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---