You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Weishung Chung <we...@gmail.com> on 2011/03/19 17:01:45 UTC
File formats in Hadoop
I am browsing through the hadoop.io package and was wondering what other
file formats are available in hadoop other than SequenceFile and TFile?
Is all data written through hadoop including those from hbase saved in the
above formats? It seems like SequenceFile is in key value pair format.
Thank you so much :)
Re: Is there "useradd" in Hadoop
Posted by Brian Bockelman <bb...@cse.unl.edu>.
In .20 and later, user and group information is taken from the NN's OS.
There is no useradd or groupadd.
Brian
On Mar 23, 2011, at 1:19 AM, springring wrote:
> Hi,
>
> There are "chmod"、"chown"、"chgrp" in HDFS,
> is there some command like "useradd -g" to add a
> user in a group,? Even more, is there "hadoop's
> group", not "linux's group"?
>
>
> Ring
Is there "useradd" in Hadoop
Posted by springring <sp...@126.com>.
Hi,
There are "chmod"、"chown"、"chgrp" in HDFS,
is there some command like "useradd -g" to add a
user in a group,? Even more, is there "hadoop's
group", not "linux's group"?
Ring
Re: how to create a group in hdfs
Posted by springring <sp...@126.com>.
Segel,
I got it, and sorry I just send this mail by "answer all" another mail,
and forget that include hbase.
Thanks.
Ring
----- Original Message -----
From: "Segel, Mike" <ms...@navteq.com>
To: <co...@hadoop.apache.org>; "Ryan Rawson" <ry...@gmail.com>
Cc: <us...@hbase.apache.org>; <co...@hadoop.apache.org>
Sent: Wednesday, March 23, 2011 10:32 PM
Subject: RE: how to create a group in hdfs
Not sure why this has anything to do with hbase...
The short answer...
Outside of the supergroup which is controlled by dfs.permissions.supergroup,
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.
So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.
HTH
-Mike
-----Original Message-----
From: springring [mailto:springring@126.com]
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs
Hi,
how to create a user group in hdfs?
hadoop fs -?
Ring
The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
Re: how to create a group in hdfs
Posted by springring <sp...@126.com>.
Segel,
I got it, and sorry I just send this mail by "answer all" another mail,
and forget that include hbase.
Thanks.
Ring
----- Original Message -----
From: "Segel, Mike" <ms...@navteq.com>
To: <co...@hadoop.apache.org>; "Ryan Rawson" <ry...@gmail.com>
Cc: <us...@hbase.apache.org>; <co...@hadoop.apache.org>
Sent: Wednesday, March 23, 2011 10:32 PM
Subject: RE: how to create a group in hdfs
Not sure why this has anything to do with hbase...
The short answer...
Outside of the supergroup which is controlled by dfs.permissions.supergroup,
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.
So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.
HTH
-Mike
-----Original Message-----
From: springring [mailto:springring@126.com]
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs
Hi,
how to create a user group in hdfs?
hadoop fs -?
Ring
The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
RE: how to create a group in hdfs
Posted by "Segel, Mike" <ms...@navteq.com>.
Not sure why this has anything to do with hbase...
The short answer...
Outside of the supergroup which is controlled by dfs.permissions.supergroup,
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.
So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.
HTH
-Mike
-----Original Message-----
From: springring [mailto:springring@126.com]
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs
Hi,
how to create a user group in hdfs?
hadoop fs -?
Ring
The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
RE: how to create a group in hdfs
Posted by "Segel, Mike" <ms...@navteq.com>.
Not sure why this has anything to do with hbase...
The short answer...
Outside of the supergroup which is controlled by dfs.permissions.supergroup,
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.
So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.
HTH
-Mike
-----Original Message-----
From: springring [mailto:springring@126.com]
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs
Hi,
how to create a user group in hdfs?
hadoop fs -?
Ring
The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.
how to create a group in hdfs
Posted by springring <sp...@126.com>.
Hi,
how to create a user group in hdfs?
hadoop fs -?
Ring
how to create a group in hdfs
Posted by springring <sp...@126.com>.
Hi,
how to create a user group in hdfs?
hadoop fs -?
Ring
Is there "useradd" in Hadoop
Posted by springring <sp...@126.com>.
Hi,
There are "chmod"、"chown"、"chgrp" in HDFS,
is there some command like "useradd -g" to add a
user in a group,? Even more, is there "hadoop's
group", not "linux's group"?
Ring
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
They are used in hadoop
org.apache.hadoop.io.SequenceFile
org.apache.hadoop.io.file.tfile.TFile
On Tue, Mar 22, 2011 at 10:06 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Curious, why do you mention "SequenceFile" and "TFile". Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
>
> -ryan
>
> On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
> >
> > Thank you so much :)
> >
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
They are used in hadoop
org.apache.hadoop.io.SequenceFile
org.apache.hadoop.io.file.tfile.TFile
On Tue, Mar 22, 2011 at 10:06 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Curious, why do you mention "SequenceFile" and "TFile". Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
>
> -ryan
>
> On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
> >
> > Thank you so much :)
> >
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
They are used in hadoop
org.apache.hadoop.io.SequenceFile
org.apache.hadoop.io.file.tfile.TFile
On Tue, Mar 22, 2011 at 10:06 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Curious, why do you mention "SequenceFile" and "TFile". Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
>
> -ryan
>
> On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
> >
> > Thank you so much :)
> >
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
I was exploring different file formats in hadoop initially but somehow the
search spanned into hbase i guess
Sorry for the confusion Ryan :D
On Wed, Mar 23, 2011 at 7:18 AM, Harsh J <qw...@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 8:36 AM, Ryan Rawson <ry...@gmail.com> wrote:
> > Curious, why do you mention "SequenceFile" and "TFile". Neither of
> > those are either in the hbase.io, and TFile is not used anywhere in
> > HBase.
>
> A cross-posting side-effect, I guess :-)
>
> --
> Harsh J
> http://harshj.com
>
Re: File formats in Hadoop
Posted by Harsh J <qw...@gmail.com>.
On Wed, Mar 23, 2011 at 8:36 AM, Ryan Rawson <ry...@gmail.com> wrote:
> Curious, why do you mention "SequenceFile" and "TFile". Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
A cross-posting side-effect, I guess :-)
--
Harsh J
http://harshj.com
Re: File formats in Hadoop
Posted by Ryan Rawson <ry...@gmail.com>.
Curious, why do you mention "SequenceFile" and "TFile". Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.
-ryan
On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
>
> Thank you so much :)
>
Re: File formats in Hadoop
Posted by Ryan Rawson <ry...@gmail.com>.
Curious, why do you mention "SequenceFile" and "TFile". Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.
-ryan
On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
>
> Thank you so much :)
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
I found this interesting article about sequence file, share it here
http://www.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/
On Sun, Mar 20, 2011 at 6:04 AM, Niels Basjes <Ni...@basjes.nl> wrote:
> And then there is the matter of how you put the data in the file. I've
> heard that some people write the data as protocolbuffers into the
> sequence file.
>
> 2011/3/19 Harsh J <qw...@gmail.com>:
> > Hello,
> >
> > On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com>
> wrote:
> >> I am browsing through the hadoop.io package and was wondering what
> other
> >> file formats are available in hadoop other than SequenceFile and TFile?
> >
> > Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of
> > SequenceFiles, if you need maps/sets), and IFiles (Used by the
> > map-output buffers to produce a key-value file for Reducers to use,
> > internal use only).
> >
> > Apache Hive use RCFiles, which is very interesting too. Apache Avro
> > provides Avro-Datafiles that are designed for use with Hadoop
> > Map/Reduce + Avro-serialized data.
> >
> > I'm not sure of this one, but Pig probably was implementing a
> > table-file-like solution of their own a while ago. Howl?
> >
> > --
> > Harsh J
> > http://harshj.com
> >
>
>
>
> --
> Met vriendelijke groeten,
>
> Niels Basjes
>
Re: File formats in Hadoop
Posted by Niels Basjes <Ni...@basjes.nl>.
And then there is the matter of how you put the data in the file. I've
heard that some people write the data as protocolbuffers into the
sequence file.
2011/3/19 Harsh J <qw...@gmail.com>:
> Hello,
>
> On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com> wrote:
>> I am browsing through the hadoop.io package and was wondering what other
>> file formats are available in hadoop other than SequenceFile and TFile?
>
> Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of
> SequenceFiles, if you need maps/sets), and IFiles (Used by the
> map-output buffers to produce a key-value file for Reducers to use,
> internal use only).
>
> Apache Hive use RCFiles, which is very interesting too. Apache Avro
> provides Avro-Datafiles that are designed for use with Hadoop
> Map/Reduce + Avro-serialized data.
>
> I'm not sure of this one, but Pig probably was implementing a
> table-file-like solution of their own a while ago. Howl?
>
> --
> Harsh J
> http://harshj.com
>
--
Met vriendelijke groeten,
Niels Basjes
Re: File formats in Hadoop
Posted by Harsh J <qw...@gmail.com>.
Hello,
On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of
SequenceFiles, if you need maps/sets), and IFiles (Used by the
map-output buffers to produce a key-value file for Reducers to use,
internal use only).
Apache Hive use RCFiles, which is very interesting too. Apache Avro
provides Avro-Datafiles that are designed for use with Hadoop
Map/Reduce + Avro-serialized data.
I'm not sure of this one, but Pig probably was implementing a
table-file-like solution of their own a while ago. Howl?
--
Harsh J
http://harshj.com
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
Thank you for the info, HFile looks interesting, can't wait to dig into the
code and get a better understanding of HFile !
On Sat, Mar 19, 2011 at 11:28 AM, Harsh J <qw...@gmail.com> wrote:
> Hello,
>
> On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com>
> wrote:
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
>
> HBase provides its own format called HFile. See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/HFile.html
> for details!
>
> --
> Harsh J
> http://harshj.com
>
Re: File formats in Hadoop
Posted by Harsh J <qw...@gmail.com>.
Hello,
On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com> wrote:
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
HBase provides its own format called HFile. See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/HFile.html
for details!
--
Harsh J
http://harshj.com
Fwd: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
---------- Forwarded message ----------
From: Weishung Chung <we...@gmail.com>
Date: Tue, Mar 22, 2011 at 11:31 AM
Subject: Re: File formats in Hadoop
To: Vivek Krishna <vi...@gmail.com>
Cc: user@hbase.apache.org, common-user@hadoop.apache.org,
qwertymaniac@gmail.com, Doug Cutting <cu...@apache.org>
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions
key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 : rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3: rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
I found this useful article that explains the internal storage of HFile
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
<http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html>
On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung <we...@gmail.com> wrote:
> I also found this informative article
>
> http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
>
>
>
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
> the key value pair be
> eg column family1 with one qualifier 1 with 2 versions
>
> key1 : rowkey1+column family1:qualifier1+timestamp1
> value1: corresponding cell value1
> key2 : rowkey1+column family1:qualifier1+timestamp2
> value2: corresponding cell value 2
> key3: rowkey2+column family1:qualifier1+timestamp1
> value3: corresponding cell value 3
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
> On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
>
>> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
>> might help.
>>
>> Viv
>>
>>
>>
>>
>> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>>
>>> My fellow superb hbase experts,
>>>
>>> Looking at the HFile specs and have some questions.
>>> How is a particular table cell in a HBase table being represented in the
>>> HFile? Does the key of the key value pair represent the rowkey+column
>>> family:qualifier+timestamp and the value represent the corresponding cell
>>> value? If so, to read a row, multiple key/value pair reads have to be
>>> done?
>>>
>>> Thank you :)
>>>
>>>
>>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>>> wrote:
>>>
>>> > Thank you, I will definitely take a look. Also, the TFile spec below
>>> helps
>>> > me to understand more,
>>> > what an exciting work !
>>> >
>>> >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > <
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>>> >> > I am browsing through the hadoop.io package and was wondering what
>>> >> other
>>> >> > file formats are available in hadoop other than SequenceFile and
>>> TFile?
>>> >> > Is all data written through hadoop including those from hbase saved
>>> in
>>> >> the
>>> >> > above formats? It seems like SequenceFile is in key value pair
>>> format.
>>> >>
>>> >> Avro includes a file format that works with Hadoop.
>>> >>
>>> >>
>>> >>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>> >>
>>> >> Doug
>>> >>
>>> >
>>> >
>>>
>>
>>
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
I found this useful article that explains the internal storage of HFile
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
<http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html>
On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung <we...@gmail.com> wrote:
> I also found this informative article
>
> http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
>
>
>
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
> the key value pair be
> eg column family1 with one qualifier 1 with 2 versions
>
> key1 : rowkey1+column family1:qualifier1+timestamp1
> value1: corresponding cell value1
> key2 : rowkey1+column family1:qualifier1+timestamp2
> value2: corresponding cell value 2
> key3: rowkey2+column family1:qualifier1+timestamp1
> value3: corresponding cell value 3
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
> On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
>
>> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
>> might help.
>>
>> Viv
>>
>>
>>
>>
>> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>>
>>> My fellow superb hbase experts,
>>>
>>> Looking at the HFile specs and have some questions.
>>> How is a particular table cell in a HBase table being represented in the
>>> HFile? Does the key of the key value pair represent the rowkey+column
>>> family:qualifier+timestamp and the value represent the corresponding cell
>>> value? If so, to read a row, multiple key/value pair reads have to be
>>> done?
>>>
>>> Thank you :)
>>>
>>>
>>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>>> wrote:
>>>
>>> > Thank you, I will definitely take a look. Also, the TFile spec below
>>> helps
>>> > me to understand more,
>>> > what an exciting work !
>>> >
>>> >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > <
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>>> >> > I am browsing through the hadoop.io package and was wondering what
>>> >> other
>>> >> > file formats are available in hadoop other than SequenceFile and
>>> TFile?
>>> >> > Is all data written through hadoop including those from hbase saved
>>> in
>>> >> the
>>> >> > above formats? It seems like SequenceFile is in key value pair
>>> format.
>>> >>
>>> >> Avro includes a file format that works with Hadoop.
>>> >>
>>> >>
>>> >>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>> >>
>>> >> Doug
>>> >>
>>> >
>>> >
>>>
>>
>>
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions
key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 : rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3: rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions
key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 : rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3: rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>
Re: File formats in Hadoop
Posted by Vivek Krishna <vi...@gmail.com>.
http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
might help.
Viv
On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com> wrote:
> My fellow superb hbase experts,
>
> Looking at the HFile specs and have some questions.
> How is a particular table cell in a HBase table being represented in the
> HFile? Does the key of the key value pair represent the rowkey+column
> family:qualifier+timestamp and the value represent the corresponding cell
> value? If so, to read a row, multiple key/value pair reads have to be done?
>
> Thank you :)
>
>
> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Thank you, I will definitely take a look. Also, the TFile spec below
> helps
> > me to understand more,
> > what an exciting work !
> >
> >
> >
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > <
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
> >> > I am browsing through the hadoop.io package and was wondering what
> >> other
> >> > file formats are available in hadoop other than SequenceFile and
> TFile?
> >> > Is all data written through hadoop including those from hbase saved in
> >> the
> >> > above formats? It seems like SequenceFile is in key value pair format.
> >>
> >> Avro includes a file format that works with Hadoop.
> >>
> >>
> >>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> >>
> >> Doug
> >>
> >
> >
>
Re: File formats in Hadoop
Posted by Vivek Krishna <vi...@gmail.com>.
http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
might help.
Viv
On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com> wrote:
> My fellow superb hbase experts,
>
> Looking at the HFile specs and have some questions.
> How is a particular table cell in a HBase table being represented in the
> HFile? Does the key of the key value pair represent the rowkey+column
> family:qualifier+timestamp and the value represent the corresponding cell
> value? If so, to read a row, multiple key/value pair reads have to be done?
>
> Thank you :)
>
>
> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Thank you, I will definitely take a look. Also, the TFile spec below
> helps
> > me to understand more,
> > what an exciting work !
> >
> >
> >
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > <
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
> >> > I am browsing through the hadoop.io package and was wondering what
> >> other
> >> > file formats are available in hadoop other than SequenceFile and
> TFile?
> >> > Is all data written through hadoop including those from hbase saved in
> >> the
> >> > above formats? It seems like SequenceFile is in key value pair format.
> >>
> >> Avro includes a file format that works with Hadoop.
> >>
> >>
> >>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> >>
> >> Doug
> >>
> >
> >
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
My fellow superb hbase experts,
Looking at the HFile specs and have some questions.
How is a particular table cell in a HBase table being represented in the
HFile? Does the key of the key value pair represent the rowkey+column
family:qualifier+timestamp and the value represent the corresponding cell
value? If so, to read a row, multiple key/value pair reads have to be done?
Thank you :)
On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com> wrote:
> Thank you, I will definitely take a look. Also, the TFile spec below helps
> me to understand more,
> what an exciting work !
>
>
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>
> <https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf>
> On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org> wrote:
>
>> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> > I am browsing through the hadoop.io package and was wondering what
>> other
>> > file formats are available in hadoop other than SequenceFile and TFile?
>> > Is all data written through hadoop including those from hbase saved in
>> the
>> > above formats? It seems like SequenceFile is in key value pair format.
>>
>> Avro includes a file format that works with Hadoop.
>>
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>
>> Doug
>>
>
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
My fellow superb hbase experts,
Looking at the HFile specs and have some questions.
How is a particular table cell in a HBase table being represented in the
HFile? Does the key of the key value pair represent the rowkey+column
family:qualifier+timestamp and the value represent the corresponding cell
value? If so, to read a row, multiple key/value pair reads have to be done?
Thank you :)
On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com> wrote:
> Thank you, I will definitely take a look. Also, the TFile spec below helps
> me to understand more,
> what an exciting work !
>
>
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>
> <https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf>
> On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org> wrote:
>
>> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> > I am browsing through the hadoop.io package and was wondering what
>> other
>> > file formats are available in hadoop other than SequenceFile and TFile?
>> > Is all data written through hadoop including those from hbase saved in
>> the
>> > above formats? It seems like SequenceFile is in key value pair format.
>>
>> Avro includes a file format that works with Hadoop.
>>
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>
>> Doug
>>
>
>
Re: File formats in Hadoop
Posted by Weishung Chung <we...@gmail.com>.
Thank you, I will definitely take a look. Also, the TFile spec below helps
me to understand more,
what an exciting work !
https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
<https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf>
On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org> wrote:
> On 03/19/2011 09:01 AM, Weishung Chung wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
>
> Avro includes a file format that works with Hadoop.
>
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>
> Doug
>
Re: File formats in Hadoop
Posted by Doug Cutting <cu...@apache.org>.
On 03/19/2011 09:01 AM, Weishung Chung wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
Avro includes a file format that works with Hadoop.
http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
Doug
Re: File formats in Hadoop
Posted by Ryan Rawson <ry...@gmail.com>.
Curious, why do you mention "SequenceFile" and "TFile". Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.
-ryan
On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
>
> Thank you so much :)
>