You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Karim Awara <ka...@kaust.edu.sa> on 2014/05/11 15:41:51 UTC

partition file by content based through HDFS

Hi,

When a user is uploading a file from the local disk to its HDFS, can I make
it partition the file into blocks based on its content?  Meaning, if I have
a file with one integer column, can i say, I want the hdfs block to have
even numbers?




--
Best Regards,
Karim Ahmed Awara

-- 

------------------------------
This message and its contents, including attachments are intended solely 
for the original recipient. If you are not the intended recipient or have 
received this message in error, please notify me immediately and delete 
this message from your computer system. Any unauthorized use or 
distribution is prohibited. Please consider the environment before printing 
this email.

Re: partition file by content based through HDFS

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,

In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.

What exactly is the use case?

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

RE: partition file by content based through HDFS

Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats.  That would be analogous to asking specific content to end up on specific disk sectors in a normal file.  If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.

But this kind of begs the question “why”?  MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything.  Is that too slow for your needs?

john

From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS

Hi,

HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.

Best wishes,
Mirko

2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content?  Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?



--
Best Regards,
Karim Ahmed Awara

________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.


RE: partition file by content based through HDFS

Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats.  That would be analogous to asking specific content to end up on specific disk sectors in a normal file.  If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.

But this kind of begs the question “why”?  MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything.  Is that too slow for your needs?

john

From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS

Hi,

HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.

Best wishes,
Mirko

2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content?  Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?



--
Best Regards,
Karim Ahmed Awara

________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.


RE: partition file by content based through HDFS

Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats.  That would be analogous to asking specific content to end up on specific disk sectors in a normal file.  If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.

But this kind of begs the question “why”?  MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything.  Is that too slow for your needs?

john

From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS

Hi,

HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.

Best wishes,
Mirko

2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content?  Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?



--
Best Regards,
Karim Ahmed Awara

________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.


RE: partition file by content based through HDFS

Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats.  That would be analogous to asking specific content to end up on specific disk sectors in a normal file.  If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.

But this kind of begs the question “why”?  MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything.  Is that too slow for your needs?

john

From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS

Hi,

HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.

Best wishes,
Mirko

2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content?  Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?



--
Best Regards,
Karim Ahmed Awara

________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.


Re: partition file by content based through HDFS

Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,

HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.

Best wishes,
Mirko


2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Re: partition file by content based through HDFS

Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,

HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.

Best wishes,
Mirko


2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Re: partition file by content based through HDFS

Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,

HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.

Best wishes,
Mirko


2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Re: partition file by content based through HDFS

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,

In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.

What exactly is the use case?

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Re: partition file by content based through HDFS

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,

In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.

What exactly is the use case?

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Re: partition file by content based through HDFS

Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,

HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.

Best wishes,
Mirko


2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Re: partition file by content based through HDFS

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,

In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.

What exactly is the use case?

Warm Regards,
Tariq
cloudfront.blogspot.com


On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:

> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content?  Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.