You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Johannes Schwenk <jo...@adition.com> on 2012/10/15 12:04:28 UTC

Set block size of output

Hi,

I would like to set the HDFS block size of my pig scripts output files.
How do I do that? I tried to use

PIG_OPTS="-Dpig.path.block.size=1048576";

which seemed to me the only appropriate option I could find.

Thanks for any hints!
Johannes Schwenk

-- 
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434


Re: Set block size of output

Posted by Johannes Schwenk <jo...@adition.com>.
Am 22.10.2012 04:01, schrieb Joe Crobak:
> Hi Johannes,
> 
> HDFS block size is controlled by the property 'dfs.blocksize'. You should
> be able to use `set` to control this within your pig script:
> http://pig.apache.org/docs/r0.10.0/cmds.html#set I think that it should
> also work to pass that in via PIG_OPTS, e.g.
> PIG_OPTS='-Ddfs.blocksize=1048576'

Hi Joe,

thanks, this works well. It's dfs.block.size by the way.

Now, is it possible to set this on a per STORE statement basis? If I
have two STORE statements and want the first of them use the default
block size and the second a very small block size, this should be
possible like this:


[...]
STORE a INTO '/user/schwenk/out/a';
SET dfs.block.size 2048;
STORE b INTO '/user/schwenk/out/b';


To my surprise, the files in out/a also had a blocksize of only 2KB!

What can I do? Do I have to write my own storage function for this?

Thanks,
Johannes

> HTH,
> Joe
> 
> On Mon, Oct 15, 2012 at 6:04 AM, Johannes Schwenk <
> johannes.schwenk@adition.com> wrote:
> 
>> Hi,
>>
>> I would like to set the HDFS block size of my pig scripts output files.
>> How do I do that? I tried to use
>>
>> PIG_OPTS="-Dpig.path.block.size=1048576";
>>
>> which seemed to me the only appropriate option I could find.
>>
>> Thanks for any hints!
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
> 



Johannes Schwenk

-- 
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434


Re: Set block size of output

Posted by Joe Crobak <jo...@gmail.com>.
Hi Johannes,

HDFS block size is controlled by the property 'dfs.blocksize'. You should
be able to use `set` to control this within your pig script:
http://pig.apache.org/docs/r0.10.0/cmds.html#set I think that it should
also work to pass that in via PIG_OPTS, e.g.
PIG_OPTS='-Ddfs.blocksize=1048576'

HTH,
Joe

On Mon, Oct 15, 2012 at 6:04 AM, Johannes Schwenk <
johannes.schwenk@adition.com> wrote:

> Hi,
>
> I would like to set the HDFS block size of my pig scripts output files.
> How do I do that? I tried to use
>
> PIG_OPTS="-Dpig.path.block.size=1048576";
>
> which seemed to me the only appropriate option I could find.
>
> Thanks for any hints!
> Johannes Schwenk
>
> --
> Softwareentwickler (Reporting)
> ________________________________________________________
>
> ADITION technologies AG
> Schwarzwaldstraße 78b
> 79117 Freiburg
>
> http://www.adition.com
>
> T +49 / (0)761 / 88147 - 30
> F +49 / (0)761 / 88147 - 77
> SUPPORT +49  / (0)1805 - ADITION
>
> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>
> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> UStIDNr.: DE 218 858 434
>
>

Re: Set block size of output

Posted by Johannes Schwenk <jo...@adition.com>.
Sorry, forgot to mention I am using

Cloudera 2.0.0-cdh4.0.1 with pig 0.10.0-SNAPSHOT


Am 15.10.2012 12:04, schrieb Johannes Schwenk:
> Hi,
> 
> I would like to set the HDFS block size of my pig scripts output files.
> How do I do that? I tried to use
> 
> PIG_OPTS="-Dpig.path.block.size=1048576";
> 
> which seemed to me the only appropriate option I could find.
> 
> Thanks for any hints!
> Johannes Schwenk
> 



Johannes Schwenk

-- 
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434