You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Richard Ding (JIRA)" <ji...@apache.org> on 2011/03/01 22:40:36 UTC

[jira] Created: (PIG-1874) Make PigServer work in a multithreading environment

Make PigServer work in a multithreading environment
---------------------------------------------------

                 Key: PIG-1874
                 URL: https://issues.apache.org/jira/browse/PIG-1874
             Project: Pig
          Issue Type: Improvement
          Components: impl
    Affects Versions: 0.8.0
            Reporter: Richard Ding
            Assignee: Richard Ding
             Fix For: 0.9.0


This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 





-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Vincent BARAT (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053923#comment-13053923 ] 

Vincent BARAT commented on PIG-1874:
------------------------------------

Thanks guys ! You save my life with this patch !

> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>         Attachments: PIG-1874.patch, PIG-1874_1.patch
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Thomas Memenga (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055422#comment-13055422 ] 

Thomas Memenga commented on PIG-1874:
-------------------------------------

Be aware that the current implementation seems to have a memory leak if you reuse the threads.

I have executed 1000s of (very small) pig jobs in parallel using a java.util.ExecutorService (fixed size thread pool)
and I ran into memory problems after 3-4 hours. (Statistics related ?)

My workaround: Spawning a new thread for each PigServer and let the garbage collector do the clean up.






> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>         Attachments: PIG-1874.patch, PIG-1874_1.patch
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding resolved PIG-1874.
-------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Patch committed to trunk.

> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>         Attachments: PIG-1874.patch, PIG-1874_1.patch
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1874:
------------------------------

    Attachment: PIG-1874_1.patch

Attaching patch that added a unit test for UDFContext. There also are existing unit tests for parallel execution of bound script in embedded Pig.

Test-patch output:

{code}
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     -1 release audit.  The applied patch generated 541 release audit warnings (more than the trunk's current 540 warnings).
{code}

The release audit warning is html releted.

> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>         Attachments: PIG-1874.patch, PIG-1874_1.patch
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1874:
------------------------------

    Attachment: PIG-1874.patch

Attaching patch for review.

This patch removed the static variables from PigServer and PigContext classes. It also made UDFContext instance thread local.

To avoid sharing PigContext object, users should use following constructors to create PigServer instance in each thread:

{code}
public PigServer(ExecType execType) throws ExecException;

public PigServer(ExecType execType, Properties properties) throws ExecException;
{code} 

> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>         Attachments: PIG-1874.patch
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005183#comment-13005183 ] 

Alan Gates commented on PIG-1874:
---------------------------------

Changes looks good.  What kind of testing are we doing to make sure we can have PigServers running in multiple threads with no clashes?

> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>         Attachments: PIG-1874.patch
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1874) Make PigServer work in a multithreading environment

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001257#comment-13001257 ] 

Santhosh Srinivasan commented on PIG-1874:
------------------------------------------

+1

> Make PigServer work in a multithreading environment
> ---------------------------------------------------
>
>                 Key: PIG-1874
>                 URL: https://issues.apache.org/jira/browse/PIG-1874
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.9.0
>
>
> This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized). 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira