You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Andrey Klochkov (JIRA)" <ji...@apache.org> on 2012/08/30 20:43:07 UTC

[jira] [Created] (PIG-2898) Multithreaded execution of e2e tests

Andrey Klochkov created PIG-2898:
------------------------------------

             Summary: Multithreaded execution of e2e tests
                 Key: PIG-2898
                 URL: https://issues.apache.org/jira/browse/PIG-2898
             Project: Pig
          Issue Type: Improvement
          Components: e2e harness
            Reporter: Andrey Klochkov
            Assignee: Andrey Klochkov


Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.

We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy reopened PIG-2898:
-------------------------------------


Ivan,
  Re-opening the issue. Until the patch is committed, it cannot be be marked resolved. It needs to be left in patch available state. 
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk-2.patch, xmlReport-fixed-duration.pl
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Status: Open  (was: Patch Available)

cancelling the prev. patch to test the last one.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on PIG-2898 started by Ivan A. Veselovsky.

> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465715#comment-13465715 ] 

Ivan A. Veselovsky commented on PIG-2898:
-----------------------------------------

Performance measurement results with patch #5 in e2e local mode:
1) the test results are the same;
2) parallelized local mode ran 2.7 times faster than the sequential one (250 minutes vs. 91 minute).
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-5.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: pig-2898-for-svn-branch-0.9.patch)
    
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481528#comment-13481528 ] 

Daniel Dai commented on PIG-2898:
---------------------------------

One other question, is the test log in sequence in the case of parallel execution?
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485246#comment-13485246 ] 

Rohini Palaniswamy commented on PIG-2898:
-----------------------------------------

+1. Thanks Ivan. Tested the new patch. It is now faster even for running two or three tests. It would have been nicer if you had named the new methods something more meaningful like onGroupRunSetup()/onGroupRunCleanup() instead of globalSetup2() or cleanupSetup2(). But I guess it is ok since it is only the e2e test script. 
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-branch-0.10-7.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-against-trunk-2.patch

Slightly improved version of the patch: it also resolves issues with collisions due to identical 'tmpPath' value in the parallel tests.
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk-2.patch, xmlReport-fixed-duration.pl
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-2898:
------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed to 0.11 and trunk. Thanks Ivan.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465562#comment-13465562 ] 

Ivan A. Veselovsky commented on PIG-2898:
-----------------------------------------

Please see also the comments on the review board: https://reviews.apache.org/r/7053/ -- the patch #5 uploaded there.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-5.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Andrey Klochkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrey Klochkov updated PIG-2898:
---------------------------------

    Summary: Parallel execution of e2e tests  (was: Multithreaded execution of e2e tests)

changing the name of JIRA from "multithreaded" to "parallel" as we've done it with forks instead of threads.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-branch-0.10-6-final.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: xmlReport-fixed-duration.pl

Also i'm attaching script xmlReport.pl with corrected calculation of the tests duration: xmlReport-fixed-duration.pl
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk.patch, xmlReport-fixed-duration.pl
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487221#comment-13487221 ] 

Rohini Palaniswamy commented on PIG-2898:
-----------------------------------------

Thanks Ivan. Will commit soon. Just FYI. It is not required to cleanup after by deleting older patches. We usually leave them in the jira so that the history of the progress is there. 
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-trunk-5.patch

Hi, Rohini,
all the mentioned suggestions were adderessed in the patch #5. This patch is cumulative: it aggregates all the changes made in previous patches.

Notes:

* In parallellized mode context (like "[myfile.conf-MyGroup]" is printed *after* the results due to formatting issues (some contexts are too long).

* In trunk branch test streaming_local.conf/StreamingLocal_11 hangs in local mode (observed in both sequential and parallel execution modes). So, I recommend to comment it out to get full results. 

* The local dir parametrized with 'hadoop.mapred.local.dir' in ant, or 'HADOOP_MAPRED_LOCAL_DIR' in environment.

* Debug output parametrized with 'e2e.debug' in ant, or 'E2E_DEBUG' in environment. 
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-5.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486352#comment-13486352 ] 

Daniel Dai commented on PIG-2898:
---------------------------------

+1. Rohini, can you commit once you get permission?
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-branch-0.10-7.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ilya Katsov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilya Katsov updated PIG-2898:
-----------------------------

    Attachment: PIG-2898-trunk-3.patch
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ilya Katsov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilya Katsov updated PIG-2898:
-----------------------------

               Labels: test  (was: )
    Affects Version/s: 0.10.0
               Status: Patch Available  (was: Reopened)
    
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Status: Patch Available  (was: Open)

testing the latest patch (-6-final) against trunk.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-branch-0.10-7.patch

I'm also attaching new version of the patch patch agains "branch-0.10": PIG-2898-branch-0.10-7.patch.

The new version of path agaings trunk is also uploaded to the review board: https://reviews.apache.org/r/7053/
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-branch-0.10-7.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Ilya Katsov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461711#comment-13461711 ] 

Ilya Katsov commented on PIG-2898:
----------------------------------

Updating a patch with fixes for e2e local mode.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481532#comment-13481532 ] 

Rohini Palaniswamy commented on PIG-2898:
-----------------------------------------

There is a logfile created per test group. Once the execution of test group completes, the log is appended to the main log file. 
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463537#comment-13463537 ] 

Rohini Palaniswamy commented on PIG-2898:
-----------------------------------------

Ilya,
   Thanks it worked for local mode. But there were some additional test failures than usual in H23. H20 tests are still running and I will update status on them tomorrow.

Few comments:
1) Can you change yarn.nodemanager.local-dirs to mapreduce.cluster.local.dir. yarn.nodemanager.local-dirs was a wrong suggestion from me. Tested with mapreduce.cluster.local.dir and it works. 
2) Can you change name of hadoop.mapred.dir to hadoop.mapred.local.dir as it is slightly confusing and make it configurable through commandline. In many cases /tmp gets full and would be good to have the ability to point to some other dir.
3) I had some comments in the reviewboard. Can you incorporate them too and post an updated patch in reviewboard.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-trunk-6-final.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450854#comment-13450854 ] 

Rohini Palaniswamy commented on PIG-2898:
-----------------------------------------

Ivan,
  All the changes to e2e framework to get it working with H23 and benchmark caching is removed with this patch. I think this is because you started before https://issues.apache.org/jira/browse/PIG-2484 went into 0.9 branch. You will have to update the patch with those included.
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Andrey Klochkov
>         Attachments: pig-2898-for-svn-branch-0.9.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452991#comment-13452991 ] 

Ivan A. Veselovsky commented on PIG-2898:
-----------------------------------------

My testing shows that the same patch applies okay to "branch-0.10".
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-trunk-5.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-trunk-3.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-fix-sub-prototypes.patch

"PIG-2898-fix-sub-prototypes.patch" (attached): additional patch that fixes subroutines prototypes: the original patch worked ok with perl 5.14.2, but did not work with perl 5.8.8. This patch resolves this issue. This patch is to be applied after "PIG-2898-against-trunk.patch".
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-2898:
------------------------------------

    Assignee: Ivan A. Veselovsky  (was: Andrey Klochkov)
    
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: pig-2898-for-svn-branch-0.9.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-trunk-8.patch

Hi, Rohini, 
in the attached "PIG-2898-trunk-8.patch" I renamed the methods globalSetup2()/globalCleanup2() to globalSetupConditional()/globalCleanupConditional() respectively.
This is different from the names onGroupRunXxx() suggested by you because of 2 reasons:
1) globalSetup2() executed once per test config file in sequential mode, and once per test group in parallel mode, so, "onGroupRun" is not quite exact.
2) globalXxx2() methods complement the globalXxx() methods (are their conditional parts), so, their names should be similar to each other.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-branch-0.10-7.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-against-trunk.patch

Thanks, Rohini,
I fixed some issues in the code and prepared the patch against "trunk". Later I will attach the patch for 0.10 also.

Some more comments to this change:
1) --startat (-st) test_harness.pl option is not supported in the group fork factor is greater than 1. This is because of the fact that several test groups are executed simultaneously.
2) In a forked mode "Results so far, ..." lines on the console may show results that are less than the actual results achieved so far. This is because each subprocess prints only its own data and does not know about the progress of other subprocesses.
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk.patch, pig-2898-for-svn-branch-0.9.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Patch Info: Patch Available

We provided parallelized mode of the e2e tests execution using Parallel::ForkManager.
Two parameters affect the behavior: 
1) file.fork.factor -- max number of subprocesses when running test configuration files (.conf);
2) fork.factor -- max number of subprocesses when running tests within one .conf file.
Total max number of subprocesses canot exceed the product of the 2 values.
Value <= 1 mean no paralellizing.
Example: ant -Dfork.factor=3 -Dfile.fork.factor=3 ... test-e2e

The attached patch is to be applied to http://svn.apache.org/repos/asf/pig/branches/branch-0.9/ branch.

The patch testing procedure gives the following results for the patch:
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 24 new or modified tests.
     [exec] 
     [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec] 
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Andrey Klochkov
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky resolved PIG-2898.
-------------------------------------

    Resolution: Implemented

Marking as Implemented since the patch is available.
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446244#comment-13446244 ] 

Julien Le Dem commented on PIG-2898:
------------------------------------

Hi Andrey, this sounds interesting.
do you have a patch available?
Julien
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Andrey Klochkov
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470469#comment-13470469 ] 

Rohini Palaniswamy commented on PIG-2898:
-----------------------------------------

+1 nonbinding. 

https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-HowtoRune2eTests needs to be updated with the following information after the patch goes in. 
   * the options fork.factor.conf.file (number of parallel forks for the test files) and fork.factor.group (number of forks for groups within a test file)
   * Parallel::Forkmanager perl module installation 
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-fix-sub-prototypes.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-5.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ilya Katsov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilya Katsov updated PIG-2898:
-----------------------------

    Attachment:     (was: PIG-2898-trunk-3.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481240#comment-13481240 ] 

Daniel Dai commented on PIG-2898:
---------------------------------

One thing I can see is the initialization cost is high when running in parallel mode. Test try to create a temp directory for every section even we only run a bunch of tests. I wonder if it is easy to do to create temp directory on demand rather than upfront?
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-against-trunk.patch)
    
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk-2.patch, xmlReport-fixed-duration.pl
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-trunk-7.patch

Hi, Daniel, Rohini,
I implemented the required optimization which ensures that the local and HDFS directories are created only when needed (on demand).
These changes are in newly attached "PIG-2898-trunk-7.patch".

The idea of the fix is that we splitted methods #globalSetup() and #globalCleanup() into 2 parts: new methods #globalSetup2() and #globalClenup2() methods introduced. The method #globalSetup2() only invoked if there is some test to execute, and #globalCleanup2() is only invoked if #globalSetup2() was invoked.

Also I in this patch I reverted one of previous changes that changed IPC::Run::run('mkdir' ...) to "mkpath" perl call because "mkpath" appears to have (at lest on my perl implementation 5.14.2) quite strange feature: it returns non-zero exit status with "No such file or directory" message if the directory we're attempting to create already exists. This behavior is unexpected and confusing because it contradicts to native "mkdir -p" and java.io.File#mkdirs() behavior. So, despite of the fact that IPC::Run::run is slower, I prefer to use it to avoid developer's trouble.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ilya Katsov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilya Katsov updated PIG-2898:
-----------------------------

    Attachment: PIG-2898-trunk-3.patch
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-against-trunk-2.patch, PIG-2898-fix-sub-prototypes.patch, PIG-2898-trunk-3.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-2898:
------------------------------------

    Fix Version/s: 0.12
                   0.11
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-branch-0.10-7.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: PIG-2898-trunk-6-final.patch
                PIG-2898-branch-0.10-6-final.patch

Attached reviewed and tested versions of the patches (6-final): against "trunk" and "branch-0.10" respectively.
                
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-branch-0.10-6-final.patch, PIG-2898-trunk-3.patch, PIG-2898-trunk-6-final.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: xmlReport-fixed-duration.pl)
    
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>         Attachments: PIG-2898-against-trunk-2.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-branch-0.10-7.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-against-trunk-2.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-5.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Multithreaded execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment: pig-2898-for-svn-branch-0.9.patch

the patch pig-2898-for-svn-branch-0.9.patch is attached.
                
> Multithreaded execution of e2e tests
> ------------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>            Reporter: Andrey Klochkov
>            Assignee: Andrey Klochkov
>         Attachments: pig-2898-for-svn-branch-0.9.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2898) Parallel execution of e2e tests

Posted by "Ivan A. Veselovsky (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan A. Veselovsky updated PIG-2898:
------------------------------------

    Attachment:     (was: PIG-2898-trunk-7.patch)
    
> Parallel execution of e2e tests
> -------------------------------
>
>                 Key: PIG-2898
>                 URL: https://issues.apache.org/jira/browse/PIG-2898
>             Project: Pig
>          Issue Type: Improvement
>          Components: e2e harness
>    Affects Versions: 0.10.0
>            Reporter: Andrey Klochkov
>            Assignee: Ivan A. Veselovsky
>              Labels: test
>             Fix For: 0.11, 0.12
>
>         Attachments: PIG-2898-trunk-3.patch, PIG-2898-trunk-8.patch
>
>
> Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The bottleneck here is the client side, and per our observations it can help a lot if the e2e harness would be able to run tests in parallel threads.
> We prototyped changes in e2e harness allowing to run tests in a configurable number of threads. Preliminary results show more than 6x reduction in execution time when using a small 3-nodes M/R cluster with modest configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira