You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by David Vrensk <da...@icehouse.se> on 2010/12/16 17:18:03 UTC

/tmp full, my google-fu weak

Hello fellow pig users,

I have told pig to use a separate disk for its temp files by setting
PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of its
files in /tmp:

/tmp/temp-1035677529$ find . -type f -exec ls -lh '{}' \;
-rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp82247880/.part-00000.crc
-rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp82247880/part-00000
-rw-r--r-- 1 pig pig 8 2010-12-16 14:13 ./tmp-1431528563/.part-00000.crc
-rwxrwxrwx 1 pig pig 0 2010-12-16 14:04 ./tmp-1431528563/part-00000
-rw-r--r-- 1 pig pig 3.0M 2010-12-16 14:01 ./tmp1746442640/.part-00000.crc
-rwxrwxrwx 1 pig pig 381M 2010-12-16 14:01 ./tmp1746442640/part-00000
-rw-r--r-- 1 pig pig 8.8M 2010-12-16 16:05
./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/.part-00000.crc
-rwxrwxrwx 1 pig pig 1.1G 2010-12-16 16:05
./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/part-00000
-rw-r--r-- 1 pig pig 38M 2010-12-16 14:13 ./tmp1280814018/.part-00000.crc
-rwxrwxrwx 1 pig pig 4.8G 2010-12-16 14:13 ./tmp1280814018/part-00000
-rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp1738480876/.part-00000.crc
-rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp1738480876/part-00000

I don't know what these files are and my google-fu is too weak to find
anything.

FWIW, the command line I currently use to run pig is

pig-0.6.0/bin/pig -param input=batch-20101216-130003/*
scripts/the_script.pig

I'm looking for a way to make pig put all its files on /mnt/hadoop-tmp.
Preferrably, it should be a command line argument or an environment variable
and not tweeking an xml file.  Not only will that make my scripts more
transparent, but the xml file I've heard about so far (hadoop-site.xml)
resides within the hadoop jar which is pre-built, and I'd rather avoid
cracking it open in order to modify its contents.  Preferred solution aside,
I'm glad for any help!

Thanks in advance,

David

-- 
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00

Re: /tmp full, my google-fu weak

Posted by David Vrensk <da...@icehouse.se>.
On Thu, Dec 16, 2010 at 22:41, Kris Coward <kr...@melon.org> wrote:

>
> Replace /tmp with a symlink into the disk you're trying to use?
>

I have been looking into mounting (loopback/bind) the disk there, but /tmp
is a busy disk and it makes me nervous.  I should have mentioned that from
the start.  But still, if the alternative is to upgrade to 0.8, I will
probably go the mount route.

Thanks for answering!

/David


>
> -Kris
>
> On Thu, Dec 16, 2010 at 10:27:19PM +0100, David Vrensk wrote:
> > Thanks, that's good to know.  Not to sound ungrateful, but is there any
> way
> > to do this without changing pig versions?  It's not something I'm opposed
> > to, but it's rather a bigger procedure than I was hoping for.
> >
> > /David
> >
> > On Thu, Dec 16, 2010 at 20:04, Richard Ding <rd...@yahoo-inc.com> wrote:
> >
> > >  Pig 0.8 allows you to specify its temp directory with
> -Dpig.temp.dir=<dir
> > > path> command (PIG-103).
> > >
> > >
> > >
> > > On 12/16/10 8:18 AM, "David Vrensk" <da...@icehouse.se> wrote:
> > >
> > > Hello fellow pig users,
> > >
> > > I have told pig to use a separate disk for its temp files by setting
> > > PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of
> its
> > > files in /tmp:
> > >
> > > /tmp/temp-1035677529$ find . -type f -exec ls -lh '{}' \;
> > > -rw-r--r-- 1 pig pig 308K 2010-12-16 14:13
> ./tmp82247880/.part-00000.crc
> > > -rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp82247880/part-00000
> > > -rw-r--r-- 1 pig pig 8 2010-12-16 14:13
> ./tmp-1431528563/.part-00000.crc
> > > -rwxrwxrwx 1 pig pig 0 2010-12-16 14:04 ./tmp-1431528563/part-00000
> > > -rw-r--r-- 1 pig pig 3.0M 2010-12-16 14:01
> ./tmp1746442640/.part-00000.crc
> > > -rwxrwxrwx 1 pig pig 381M 2010-12-16 14:01 ./tmp1746442640/part-00000
> > > -rw-r--r-- 1 pig pig 8.8M 2010-12-16 16:05
> > >
> ./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/.part-00000.crc
> > > -rwxrwxrwx 1 pig pig 1.1G 2010-12-16 16:05
> > > ./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/part-00000
> > > -rw-r--r-- 1 pig pig 38M 2010-12-16 14:13
> ./tmp1280814018/.part-00000.crc
> > > -rwxrwxrwx 1 pig pig 4.8G 2010-12-16 14:13 ./tmp1280814018/part-00000
> > > -rw-r--r-- 1 pig pig 308K 2010-12-16 14:13
> ./tmp1738480876/.part-00000.crc
> > > -rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp1738480876/part-00000
> > >
> > > I don't know what these files are and my google-fu is too weak to find
> > > anything.
> > >
> > > FWIW, the command line I currently use to run pig is
> > >
> > > pig-0.6.0/bin/pig -param input=batch-20101216-130003/*
> > > scripts/the_script.pig
> > >
> > > I'm looking for a way to make pig put all its files on /mnt/hadoop-tmp.
> > > Preferrably, it should be a command line argument or an environment
> > > variable
> > > and not tweeking an xml file.  Not only will that make my scripts more
> > > transparent, but the xml file I've heard about so far (hadoop-site.xml)
> > > resides within the hadoop jar which is pre-built, and I'd rather avoid
> > > cracking it open in order to modify its contents.  Preferred solution
> > > aside,
> > > I'm glad for any help!
> > >
> > > Thanks in advance,
> > >
> > > David
> > >
> > > --
> > > David Vrensk
> > > Systems developer, ICE House AB
> > > Mobile: +46 703 74 69 00
> > >
> > >
> >
> >
> > --
> > David Vrensk
> > Systems developer, ICE House AB
> > Mobile: +46 703 74 69 00
>
> --
> Kris Coward                                     http://unripe.melon.org/
> GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
>



-- 
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00

Re: /tmp full, my google-fu weak

Posted by Kris Coward <kr...@melon.org>.
Replace /tmp with a symlink into the disk you're trying to use?

-Kris

On Thu, Dec 16, 2010 at 10:27:19PM +0100, David Vrensk wrote:
> Thanks, that's good to know.  Not to sound ungrateful, but is there any way
> to do this without changing pig versions?  It's not something I'm opposed
> to, but it's rather a bigger procedure than I was hoping for.
> 
> /David
> 
> On Thu, Dec 16, 2010 at 20:04, Richard Ding <rd...@yahoo-inc.com> wrote:
> 
> >  Pig 0.8 allows you to specify its temp directory with -Dpig.temp.dir=<dir
> > path> command (PIG-103).
> >
> >
> >
> > On 12/16/10 8:18 AM, "David Vrensk" <da...@icehouse.se> wrote:
> >
> > Hello fellow pig users,
> >
> > I have told pig to use a separate disk for its temp files by setting
> > PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of its
> > files in /tmp:
> >
> > /tmp/temp-1035677529$ find . -type f -exec ls -lh '{}' \;
> > -rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp82247880/.part-00000.crc
> > -rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp82247880/part-00000
> > -rw-r--r-- 1 pig pig 8 2010-12-16 14:13 ./tmp-1431528563/.part-00000.crc
> > -rwxrwxrwx 1 pig pig 0 2010-12-16 14:04 ./tmp-1431528563/part-00000
> > -rw-r--r-- 1 pig pig 3.0M 2010-12-16 14:01 ./tmp1746442640/.part-00000.crc
> > -rwxrwxrwx 1 pig pig 381M 2010-12-16 14:01 ./tmp1746442640/part-00000
> > -rw-r--r-- 1 pig pig 8.8M 2010-12-16 16:05
> > ./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/.part-00000.crc
> > -rwxrwxrwx 1 pig pig 1.1G 2010-12-16 16:05
> > ./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/part-00000
> > -rw-r--r-- 1 pig pig 38M 2010-12-16 14:13 ./tmp1280814018/.part-00000.crc
> > -rwxrwxrwx 1 pig pig 4.8G 2010-12-16 14:13 ./tmp1280814018/part-00000
> > -rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp1738480876/.part-00000.crc
> > -rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp1738480876/part-00000
> >
> > I don't know what these files are and my google-fu is too weak to find
> > anything.
> >
> > FWIW, the command line I currently use to run pig is
> >
> > pig-0.6.0/bin/pig -param input=batch-20101216-130003/*
> > scripts/the_script.pig
> >
> > I'm looking for a way to make pig put all its files on /mnt/hadoop-tmp.
> > Preferrably, it should be a command line argument or an environment
> > variable
> > and not tweeking an xml file.  Not only will that make my scripts more
> > transparent, but the xml file I've heard about so far (hadoop-site.xml)
> > resides within the hadoop jar which is pre-built, and I'd rather avoid
> > cracking it open in order to modify its contents.  Preferred solution
> > aside,
> > I'm glad for any help!
> >
> > Thanks in advance,
> >
> > David
> >
> > --
> > David Vrensk
> > Systems developer, ICE House AB
> > Mobile: +46 703 74 69 00
> >
> >
> 
> 
> -- 
> David Vrensk
> Systems developer, ICE House AB
> Mobile: +46 703 74 69 00

-- 
Kris Coward					http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3

Re: /tmp full, my google-fu weak

Posted by David Vrensk <da...@icehouse.se>.
Thanks, that's good to know.  Not to sound ungrateful, but is there any way
to do this without changing pig versions?  It's not something I'm opposed
to, but it's rather a bigger procedure than I was hoping for.

/David

On Thu, Dec 16, 2010 at 20:04, Richard Ding <rd...@yahoo-inc.com> wrote:

>  Pig 0.8 allows you to specify its temp directory with -Dpig.temp.dir=<dir
> path> command (PIG-103).
>
>
>
> On 12/16/10 8:18 AM, "David Vrensk" <da...@icehouse.se> wrote:
>
> Hello fellow pig users,
>
> I have told pig to use a separate disk for its temp files by setting
> PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of its
> files in /tmp:
>
> /tmp/temp-1035677529$ find . -type f -exec ls -lh '{}' \;
> -rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp82247880/.part-00000.crc
> -rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp82247880/part-00000
> -rw-r--r-- 1 pig pig 8 2010-12-16 14:13 ./tmp-1431528563/.part-00000.crc
> -rwxrwxrwx 1 pig pig 0 2010-12-16 14:04 ./tmp-1431528563/part-00000
> -rw-r--r-- 1 pig pig 3.0M 2010-12-16 14:01 ./tmp1746442640/.part-00000.crc
> -rwxrwxrwx 1 pig pig 381M 2010-12-16 14:01 ./tmp1746442640/part-00000
> -rw-r--r-- 1 pig pig 8.8M 2010-12-16 16:05
> ./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/.part-00000.crc
> -rwxrwxrwx 1 pig pig 1.1G 2010-12-16 16:05
> ./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/part-00000
> -rw-r--r-- 1 pig pig 38M 2010-12-16 14:13 ./tmp1280814018/.part-00000.crc
> -rwxrwxrwx 1 pig pig 4.8G 2010-12-16 14:13 ./tmp1280814018/part-00000
> -rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp1738480876/.part-00000.crc
> -rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp1738480876/part-00000
>
> I don't know what these files are and my google-fu is too weak to find
> anything.
>
> FWIW, the command line I currently use to run pig is
>
> pig-0.6.0/bin/pig -param input=batch-20101216-130003/*
> scripts/the_script.pig
>
> I'm looking for a way to make pig put all its files on /mnt/hadoop-tmp.
> Preferrably, it should be a command line argument or an environment
> variable
> and not tweeking an xml file.  Not only will that make my scripts more
> transparent, but the xml file I've heard about so far (hadoop-site.xml)
> resides within the hadoop jar which is pre-built, and I'd rather avoid
> cracking it open in order to modify its contents.  Preferred solution
> aside,
> I'm glad for any help!
>
> Thanks in advance,
>
> David
>
> --
> David Vrensk
> Systems developer, ICE House AB
> Mobile: +46 703 74 69 00
>
>


-- 
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00

Re: /tmp full, my google-fu weak

Posted by Richard Ding <rd...@yahoo-inc.com>.
Pig 0.8 allows you to specify its temp directory with -Dpig.temp.dir=<dir path> command (PIG-103).



On 12/16/10 8:18 AM, "David Vrensk" <da...@icehouse.se> wrote:

Hello fellow pig users,

I have told pig to use a separate disk for its temp files by setting
PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of its
files in /tmp:

/tmp/temp-1035677529$ find . -type f -exec ls -lh '{}' \;
-rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp82247880/.part-00000.crc
-rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp82247880/part-00000
-rw-r--r-- 1 pig pig 8 2010-12-16 14:13 ./tmp-1431528563/.part-00000.crc
-rwxrwxrwx 1 pig pig 0 2010-12-16 14:04 ./tmp-1431528563/part-00000
-rw-r--r-- 1 pig pig 3.0M 2010-12-16 14:01 ./tmp1746442640/.part-00000.crc
-rwxrwxrwx 1 pig pig 381M 2010-12-16 14:01 ./tmp1746442640/part-00000
-rw-r--r-- 1 pig pig 8.8M 2010-12-16 16:05
./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/.part-00000.crc
-rwxrwxrwx 1 pig pig 1.1G 2010-12-16 16:05
./tmp-1936719424/_temporary/_attempt_local_0003_r_000000_0/part-00000
-rw-r--r-- 1 pig pig 38M 2010-12-16 14:13 ./tmp1280814018/.part-00000.crc
-rwxrwxrwx 1 pig pig 4.8G 2010-12-16 14:13 ./tmp1280814018/part-00000
-rw-r--r-- 1 pig pig 308K 2010-12-16 14:13 ./tmp1738480876/.part-00000.crc
-rwxrwxrwx 1 pig pig 39M 2010-12-16 14:13 ./tmp1738480876/part-00000

I don't know what these files are and my google-fu is too weak to find
anything.

FWIW, the command line I currently use to run pig is

pig-0.6.0/bin/pig -param input=batch-20101216-130003/*
scripts/the_script.pig

I'm looking for a way to make pig put all its files on /mnt/hadoop-tmp.
Preferrably, it should be a command line argument or an environment variable
and not tweeking an xml file.  Not only will that make my scripts more
transparent, but the xml file I've heard about so far (hadoop-site.xml)
resides within the hadoop jar which is pre-built, and I'd rather avoid
cracking it open in order to modify its contents.  Preferred solution aside,
I'm glad for any help!

Thanks in advance,

David

--
David Vrensk
Systems developer, ICE House AB
Mobile: +46 703 74 69 00