You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by omaralvarez <gi...@git.apache.org> on 2016/06/02 09:18:20 UTC

[GitHub] flink pull request #2063: [FLINK-4002] [py] Improve testing infraestructure

GitHub user omaralvarez opened a pull request:

    https://github.com/apache/flink/pull/2063

    [FLINK-4002] [py] Improve testing infraestructure

    The Verify() test function now does not error out when array elements are missing:
    
    ```python
    env.generate_sequence(1, 5)\
             .map(Id()).map_partition(Verify([1,2,3,4], "Sequence")).output()
    ```
    
    I have also documented test functions.
    
    While documenting, two questions arise. First, Verify2 function has no use as is, performing a `if value in self.expected:` before:
    
    ```python
    try:
        self.expected.remove(value)
    except Exception:
        raise Exception()
    ```
    
    Makes this function useless, since it will never raise and exception, if I am not mistaken. 
    
    Also, I am not sure why there are two test scripts, `main_test.py` and `main_test2.py`. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/omaralvarez/flink py_testing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2063.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2063
    
----
commit 784a602167f396cdcd1201509d3c122a5a85248f
Author: omaralvarez <om...@udc.es>
Date:   2016-06-02T09:09:08Z

    [FLINK-4002] [py] Improve testing infraestructure

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Yes, I know that how python scripts are executed for the test is different. Let me elaborate:
    
    Since running the tests are quite costly in my laptop, I normally test my changes executing them in a local instance of Flink 1.0.3, since this is less taxing. Once I complete the changes, I run `mvn verify`. The problem is that when I call `pyflink2.sh test_main.py utils.py, the module that I pass to the test script, is ignored unless I use HDFS, in which case, everything works fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Ok, I will modify it and commit the corrected version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    you are correct, Verify2 should be modified.
    
    there are 2 test scripts since most operations are running at the same time, eating up a lot of memory due to the memory-mapped files. putting them all in one can lead to OOM errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Sorry, I said it wrong, it's the opposite. The case that fails in Verify() and Verify2(), is when we have more values in expected than in the resulting DataSet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Is everything ok? Should I look into something else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    you can add it to this PR as a separate commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    I can confirm that there is a bug in `PythonPlanBinder.java`. When you pass packages to the `runPlan()` function, they are not copied to the temp directory in which `plan.py` file is generated. I manually changed the `prepareFiles()` call to copy the package files, and everything worked fine. 
    
    Should I open another issue, or just fix it here? This is not only a bug related to the testing infrastructure, but the Python API in general. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    looking at the code right now; i may have figured out why the files aren't copied, but i find it odd that it supposedly works with hdfs. it actually should never copy additional files if no parameters are given.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    you're right, that falls through. we should add additional checks:
    - Verify: index is equal to the size of expected values
    - Verify2: expected is empty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Looks good, I'll merge it later on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    test files are never copied to the tmp folder regardless of whether HDFS is used or not. This is not a bug, but intended behaviour.
    
    You can easily add the utils.py to the test invokation by modifying the call to the PythonPlanBinder.main in PythonPlanBinderTest. it should be as simple as appending "<relative-path>/utils.py" to the call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    I think everything should be ready now. I was not able to pinpoint why HDFS worked, I assume `distributeFiles()` copied the complete directory and that is why it worked, but I'm not 100% sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2063


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    when we have more data than expected, remove() will be called on an empty list and should throw an exception, no?
    
    if you want to execute the python tests you only have to call mvn verify on the flink-python package.
    ```
    cd flink-libraries/flink-python
    mvn verify
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    I have corrected Verify2(), but there is another case that is not checked, when the resulting datasets have more elements than expected, right now the error will go unnoticed.
    
    I also wanted to ask, is there a way to execute only the python tests, since I want to unify the utilities in a file, but without knowing what is the execution path, I cannot make sure if the module will be imported correctly.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    Yes, I had the same thought looking at the code, I could not figure out why it worked with HDFS...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2063: [FLINK-4002] [py] Improve testing infraestructure

Posted by omaralvarez <gi...@git.apache.org>.
Github user omaralvarez commented on the issue:

    https://github.com/apache/flink/pull/2063
  
    I have fixed the last errors in the test functions. But, while trying to refactor the utility code, that now is repeated in both test files, I think I found another bug.
    
    The thing is that, in order to be able to have another `utils.py` file, we need to execute the tests as:
    
    `pyflink2.sh test_main.py utlis.py`
    
    Right now, if HDFS is not used, our case with ` env.execute(local=True)`, the packages are not copied to the temp folder along with the script file, and the runner fails not being able to locate the module that has been imported. If we add this module to the `PYTHONPATH`everything works fine, but I believe this should happen. This is probably a matter for another JIRA issue altogether.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---