You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Johannes Schwenk <jo...@adition.com> on 2012/10/15 12:04:28 UTC
Set block size of output
Hi,
I would like to set the HDFS block size of my pig scripts output files.
How do I do that? I tried to use
PIG_OPTS="-Dpig.path.block.size=1048576";
which seemed to me the only appropriate option I could find.
Thanks for any hints!
Johannes Schwenk
--
Softwareentwickler (Reporting)
________________________________________________________
ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg
http://www.adition.com
T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49 / (0)1805 - ADITION
(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434
Re: Set block size of output
Posted by Johannes Schwenk <jo...@adition.com>.
Am 22.10.2012 04:01, schrieb Joe Crobak:
> Hi Johannes,
>
> HDFS block size is controlled by the property 'dfs.blocksize'. You should
> be able to use `set` to control this within your pig script:
> http://pig.apache.org/docs/r0.10.0/cmds.html#set I think that it should
> also work to pass that in via PIG_OPTS, e.g.
> PIG_OPTS='-Ddfs.blocksize=1048576'
Hi Joe,
thanks, this works well. It's dfs.block.size by the way.
Now, is it possible to set this on a per STORE statement basis? If I
have two STORE statements and want the first of them use the default
block size and the second a very small block size, this should be
possible like this:
[...]
STORE a INTO '/user/schwenk/out/a';
SET dfs.block.size 2048;
STORE b INTO '/user/schwenk/out/b';
To my surprise, the files in out/a also had a blocksize of only 2KB!
What can I do? Do I have to write my own storage function for this?
Thanks,
Johannes
> HTH,
> Joe
>
> On Mon, Oct 15, 2012 at 6:04 AM, Johannes Schwenk <
> johannes.schwenk@adition.com> wrote:
>
>> Hi,
>>
>> I would like to set the HDFS block size of my pig scripts output files.
>> How do I do that? I tried to use
>>
>> PIG_OPTS="-Dpig.path.block.size=1048576";
>>
>> which seemed to me the only appropriate option I could find.
>>
>> Thanks for any hints!
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49 / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>
Johannes Schwenk
--
Softwareentwickler (Reporting)
________________________________________________________
ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg
http://www.adition.com
T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49 / (0)1805 - ADITION
(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434
Re: Set block size of output
Posted by Joe Crobak <jo...@gmail.com>.
Hi Johannes,
HDFS block size is controlled by the property 'dfs.blocksize'. You should
be able to use `set` to control this within your pig script:
http://pig.apache.org/docs/r0.10.0/cmds.html#set I think that it should
also work to pass that in via PIG_OPTS, e.g.
PIG_OPTS='-Ddfs.blocksize=1048576'
HTH,
Joe
On Mon, Oct 15, 2012 at 6:04 AM, Johannes Schwenk <
johannes.schwenk@adition.com> wrote:
> Hi,
>
> I would like to set the HDFS block size of my pig scripts output files.
> How do I do that? I tried to use
>
> PIG_OPTS="-Dpig.path.block.size=1048576";
>
> which seemed to me the only appropriate option I could find.
>
> Thanks for any hints!
> Johannes Schwenk
>
> --
> Softwareentwickler (Reporting)
> ________________________________________________________
>
> ADITION technologies AG
> Schwarzwaldstraße 78b
> 79117 Freiburg
>
> http://www.adition.com
>
> T +49 / (0)761 / 88147 - 30
> F +49 / (0)761 / 88147 - 77
> SUPPORT +49 / (0)1805 - ADITION
>
> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>
> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
> UStIDNr.: DE 218 858 434
>
>
Re: Set block size of output
Posted by Johannes Schwenk <jo...@adition.com>.
Sorry, forgot to mention I am using
Cloudera 2.0.0-cdh4.0.1 with pig 0.10.0-SNAPSHOT
Am 15.10.2012 12:04, schrieb Johannes Schwenk:
> Hi,
>
> I would like to set the HDFS block size of my pig scripts output files.
> How do I do that? I tried to use
>
> PIG_OPTS="-Dpig.path.block.size=1048576";
>
> which seemed to me the only appropriate option I could find.
>
> Thanks for any hints!
> Johannes Schwenk
>
Johannes Schwenk
--
Softwareentwickler (Reporting)
________________________________________________________
ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg
http://www.adition.com
T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49 / (0)1805 - ADITION
(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434