You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2011/02/02 02:00:38 UTC

[jira] Created: (PIG-1838) On a large farm, some pigs die of /tmp starvation

On a large farm, some pigs die of /tmp starvation
-------------------------------------------------

                 Key: PIG-1838
                 URL: https://issues.apache.org/jira/browse/PIG-1838
             Project: Pig
          Issue Type: Wish
          Components: impl
    Affects Versions: 0.8.0
            Reporter: Allen Wittenauer


We're starting to issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Allen Wittenauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved PIG-1838.
-----------------------------------

    Resolution: Won't Fix

Pig team has no interesting in fixing this horrible bug.
                
> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011341#comment-13011341 ] 

Daniel Dai commented on PIG-1838:
---------------------------------

What we do is put the following lines into pig-cluster-hadoop-site.xml:
{code}
<property>
        <name>mapred.child.java.opts</name>
        <value> -Djava.io.tmpdir=xxxx</value>
</property>
{code}

And put directory containing pig-cluster-hadoop-site.xml in classpath. Pig will take it.

> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Michael Brauwerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011465#comment-13011465 ] 

Michael Brauwerman commented on PIG-1838:
-----------------------------------------

Thanks, Daniel.

I found the reason why my first attempt (setting HADOOP_OPTS before calling hadoop) failed, and in the process found an alternate solution (which I find convenient because I didn't take the opportunity to investigate classpath management yet).

The version of hadoop I am running (Amazon EMR's version) resets HADOOP_OPTS in "conf/hadoop-env.sh", which clears out any previously set value. That script then sources "conf/hadoop-user.env.sh"


So, I added 
{code}
 export HADOOP_OPTS="$HADOOP_OPTS -Djava.io.tmpdir=/mnt/tmp"
{code}

to conf/hadoop-user.env.sh
and now pig scripts use /mnt/tmp as desired.



> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201866#comment-13201866 ] 

Dmitriy V. Ryaboy commented on PIG-1838:
----------------------------------------

Allen, it's fixed by not bundling extra jars. Those tmp dirs are tiny now.
You're welcome.
                
> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Michael Brauwerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011334#comment-13011334 ] 

Michael Brauwerman commented on PIG-1838:
-----------------------------------------

With apologies for self-followup.

I went ahead and edited the hadoop launcher script to set 
 
HADOOP_OPTS="$HADOOP_OPTS -Djava.io.tmpdir=/mnt/tmp"

before calling 

 exec "$JAVA" ... $HADOOOP_OPTS ...

and that worked. It's obviously not the proper way to set java.io.tmpdir (it would be better to set an env var in the calling environment), but it succeeds as a temporary workaround.

I hope this info helps someone until a more complete solution is available.


> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Michael Brauwerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011328#comment-13011328 ] 

Michael Brauwerman commented on PIG-1838:
-----------------------------------------

I see this problem as well.
In my case, I run commands basically like this to run a bunch of pig jobs in parallel:
 for date in `list-of-dates` do nohup pig -m DATE=$date my-script.pig &

Each pig job that runs will create a /tmp/pigNNNN dir with jar files, until /tmp is exhausted.
Meanwhile, /mnt/tmp is empty and would be a better place for these files to go.

What is the workaround?

I tried editing pig.sh to add
  HADOOP_OPTS="-Djava.io.tmpdir=/mnt/tmp"
before calling hadoop, but that did not seem to work.

Is my failed workaround a typo, or is there a diffrent way I should set java.io.tmpdir when launching pig?




> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989481#comment-12989481 ] 

Allen Wittenauer commented on PIG-1838:
---------------------------------------

I have a few thoughts on a better/programmatic way for Pig to be better behaved without depending on users doing the right thing. One or more of these would probably work:

a) Redefine java.io.tmpdir itself after it gets the Hadoop property files loaded
b) In the pig wrapper script, parse mapred-site.xml and pull out the mapred tmp space
c) Override Java's createTempFile method to use Hadoop's tmp location/$TEMPDIR/$TMPDIR/$TEMP/some other value
d) Change the jar assembly such that it goes into a create->submit->delete->repeat pattern.  (From a casual glance, it appears to create all the jars at once rather than just when needed.)



> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PIG-1838) On a large farm, some pigs die of /tmp starvation

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer updated PIG-1838:
----------------------------------

    Description: We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.    (was: We're starting to issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  )

> On a large farm, some pigs die of /tmp starvation
> -------------------------------------------------
>
>                 Key: PIG-1838
>                 URL: https://issues.apache.org/jira/browse/PIG-1838
>             Project: Pig
>          Issue Type: Wish
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Allen Wittenauer
>
> We're starting to see issues where interactive/command line pig users blow up due to so many large jar creations in /tmp. (In other words, pig execution prior to the java.io.tmpdir fix that Hadoop makes can kick in.)  Pig should probably not depend upon users being savvy enough to override java.io.tmpdir on their own in these situations and/or a better steward of the space it does use.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira