You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Karim Awara <ka...@kaust.edu.sa> on 2014/05/11 15:41:51 UTC
partition file by content based through HDFS
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make
it partition the file into blocks based on its content? Meaning, if I have
a file with one integer column, can i say, I want the hdfs block to have
even numbers?
--
Best Regards,
Karim Ahmed Awara
--
------------------------------
This message and its contents, including attachments are intended solely
for the original recipient. If you are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
Re: partition file by content based through HDFS
Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,
In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.
What exactly is the use case?
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
RE: partition file by content based through HDFS
Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats. That would be analogous to asking specific content to end up on specific disk sectors in a normal file. If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.
But this kind of begs the question “why”? MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything. Is that too slow for your needs?
john
From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS
Hi,
HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content? Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?
--
Best Regards,
Karim Ahmed Awara
________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
RE: partition file by content based through HDFS
Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats. That would be analogous to asking specific content to end up on specific disk sectors in a normal file. If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.
But this kind of begs the question “why”? MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything. Is that too slow for your needs?
john
From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS
Hi,
HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content? Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?
--
Best Regards,
Karim Ahmed Awara
________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
RE: partition file by content based through HDFS
Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats. That would be analogous to asking specific content to end up on specific disk sectors in a normal file. If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.
But this kind of begs the question “why”? MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything. Is that too slow for your needs?
john
From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS
Hi,
HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content? Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?
--
Best Regards,
Karim Ahmed Awara
________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
RE: partition file by content based through HDFS
Posted by John Lilley <jo...@redpoint.net>.
To second Mirko, HDFS isn’t concerned with content or formats. That would be analogous to asking specific content to end up on specific disk sectors in a normal file. If you want to partition data by content, use MapReduce/Pig/Hive etc to segregate the data into files, perhaps naming the files to indicate the key split.
But this kind of begs the question “why”? MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything. Is that too slow for your needs?
john
From: Mirko Kämpf [mailto:mirko.kaempf@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS
Hi,
HDFS blocks are not "content aware". Such a separation like you requested, could be done via Hive or Pig with some lines of code, than you would have multiple files which can be organized in partitions as well, but such partitions are on a different abstraction level, not on blocks, but within hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content? Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?
--
Best Regards,
Karim Ahmed Awara
________________________________
This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Re: partition file by content based through HDFS
Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,
HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
Re: partition file by content based through HDFS
Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,
HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
Re: partition file by content based through HDFS
Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,
HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
Re: partition file by content based through HDFS
Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,
In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.
What exactly is the use case?
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
Re: partition file by content based through HDFS
Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,
In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.
What exactly is the use case?
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
Re: partition file by content based through HDFS
Posted by Mirko Kämpf <mi...@gmail.com>.
Hi,
HDFS blocks are not "content aware". Such a separation like you requested,
could be done via Hive or Pig with some lines of code, than you would have
multiple files which can be organized in partitions as well, but such
partitions are on a different abstraction level, not on blocks, but within
hive tables.
Best wishes,
Mirko
2014-05-11 14:41 GMT+01:00 Karim Awara <ka...@kaust.edu.sa>:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
Re: partition file by content based through HDFS
Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,
In short, no. If you intend to have partitioned data, better store it in
different files based on your needs.
What exactly is the use case?
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 11, 2014 at 7:11 PM, Karim Awara <ka...@kaust.edu.sa>wrote:
> Hi,
>
> When a user is uploading a file from the local disk to its HDFS, can I
> make it partition the file into blocks based on its content? Meaning, if I
> have a file with one integer column, can i say, I want the hdfs block to
> have even numbers?
>
>
>
>
> --
> Best Regards,
> Karim Ahmed Awara
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.