You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Weishung Chung <we...@gmail.com> on 2011/03/19 17:01:45 UTC

File formats in Hadoop

I am browsing through the hadoop.io package and was wondering what other
file formats are available in hadoop other than SequenceFile and TFile?
Is all data written through hadoop including those from hbase saved in the
above formats? It seems like SequenceFile is in key value pair format.

Thank you so much :)

Re: Is there "useradd" in Hadoop

Posted by Brian Bockelman <bb...@cse.unl.edu>.
In .20 and later, user and group information is taken from the NN's OS.

There is no useradd or groupadd.

Brian

On Mar 23, 2011, at 1:19 AM, springring wrote:

> Hi,
> 
>    There are "chmod"、"chown"、"chgrp" in HDFS,
> is there some command like "useradd -g" to add a 
> user in a group,? Even more, is there  "hadoop's
> group", not "linux's group"?
> 
> 
> Ring


Is there "useradd" in Hadoop

Posted by springring <sp...@126.com>.
Hi,

    There are "chmod"、"chown"、"chgrp" in HDFS,
is there some command like "useradd -g" to add a 
user in a group,? Even more, is there  "hadoop's
group", not "linux's group"?


Ring

Re: how to create a group in hdfs

Posted by springring <sp...@126.com>.
Segel,

    I got it, and sorry I just send this mail by "answer all" another mail, 
and forget that include hbase.
    Thanks.

Ring 


----- Original Message ----- 
From: "Segel, Mike" <ms...@navteq.com>
To: <co...@hadoop.apache.org>; "Ryan Rawson" <ry...@gmail.com>
Cc: <us...@hbase.apache.org>; <co...@hadoop.apache.org>
Sent: Wednesday, March 23, 2011 10:32 PM
Subject: RE: how to create a group in hdfs


Not sure why this has anything to do with hbase...

The short answer... 
Outside of the supergroup which is controlled by dfs.permissions.supergroup, 
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.

So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.

HTH

-Mike


-----Original Message-----
From: springring [mailto:springring@126.com] 
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs

Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring


The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited.  If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

Re: how to create a group in hdfs

Posted by springring <sp...@126.com>.
Segel,

    I got it, and sorry I just send this mail by "answer all" another mail, 
and forget that include hbase.
    Thanks.

Ring 


----- Original Message ----- 
From: "Segel, Mike" <ms...@navteq.com>
To: <co...@hadoop.apache.org>; "Ryan Rawson" <ry...@gmail.com>
Cc: <us...@hbase.apache.org>; <co...@hadoop.apache.org>
Sent: Wednesday, March 23, 2011 10:32 PM
Subject: RE: how to create a group in hdfs


Not sure why this has anything to do with hbase...

The short answer... 
Outside of the supergroup which is controlled by dfs.permissions.supergroup, 
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.

So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.

HTH

-Mike


-----Original Message-----
From: springring [mailto:springring@126.com] 
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs

Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring


The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited.  If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

RE: how to create a group in hdfs

Posted by "Segel, Mike" <ms...@navteq.com>.
Not sure why this has anything to do with hbase...

The short answer... 
Outside of the supergroup which is controlled by dfs.permissions.supergroup, 
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.

So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.

HTH

-Mike


-----Original Message-----
From: springring [mailto:springring@126.com] 
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs

Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring


The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited.  If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

RE: how to create a group in hdfs

Posted by "Segel, Mike" <ms...@navteq.com>.
Not sure why this has anything to do with hbase...

The short answer... 
Outside of the supergroup which is controlled by dfs.permissions.supergroup, 
Hadoop apparently checks to see if the owner is a member of the group you want to use.
This could be controlled by the local machine's /etc/group file, or if you're using NIS or LDAP, its controlled there.

So you can run the unix shell command groups to find out which group(s) you belong to, and then switch to one of those.

HTH

-Mike


-----Original Message-----
From: springring [mailto:springring@126.com] 
Sent: Tuesday, March 22, 2011 11:29 PM
To: common-dev@hadoop.apache.org; Ryan Rawson
Cc: user@hbase.apache.org; common-user@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: how to create a group in hdfs

Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring


The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited.  If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

how to create a group in hdfs

Posted by springring <sp...@126.com>.
Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring

how to create a group in hdfs

Posted by springring <sp...@126.com>.
Hi,

how to create a user group in hdfs?

hadoop fs -?

Ring

Is there "useradd" in Hadoop

Posted by springring <sp...@126.com>.
Hi,

    There are "chmod"、"chown"、"chgrp" in HDFS,
is there some command like "useradd -g" to add a 
user in a group,? Even more, is there  "hadoop's
group", not "linux's group"?


Ring

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
They are used in hadoop

org.apache.hadoop.io.SequenceFile

org.apache.hadoop.io.file.tfile.TFile


On Tue, Mar 22, 2011 at 10:06 PM, Ryan Rawson <ry...@gmail.com> wrote:

> Curious, why do you mention "SequenceFile" and "TFile".  Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
>
> -ryan
>
> On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
> >
> > Thank you so much :)
> >
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
They are used in hadoop

org.apache.hadoop.io.SequenceFile

org.apache.hadoop.io.file.tfile.TFile


On Tue, Mar 22, 2011 at 10:06 PM, Ryan Rawson <ry...@gmail.com> wrote:

> Curious, why do you mention "SequenceFile" and "TFile".  Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
>
> -ryan
>
> On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
> >
> > Thank you so much :)
> >
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
They are used in hadoop

org.apache.hadoop.io.SequenceFile

org.apache.hadoop.io.file.tfile.TFile


On Tue, Mar 22, 2011 at 10:06 PM, Ryan Rawson <ry...@gmail.com> wrote:

> Curious, why do you mention "SequenceFile" and "TFile".  Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.
>
> -ryan
>
> On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com>
> wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
> >
> > Thank you so much :)
> >
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
I was exploring different file formats in hadoop initially but somehow the
search spanned into hbase i guess
Sorry for the confusion Ryan :D

On Wed, Mar 23, 2011 at 7:18 AM, Harsh J <qw...@gmail.com> wrote:

> On Wed, Mar 23, 2011 at 8:36 AM, Ryan Rawson <ry...@gmail.com> wrote:
> > Curious, why do you mention "SequenceFile" and "TFile".  Neither of
> > those are either in the hbase.io, and TFile is not used anywhere in
> > HBase.
>
> A cross-posting side-effect, I guess :-)
>
> --
> Harsh J
> http://harshj.com
>

Re: File formats in Hadoop

Posted by Harsh J <qw...@gmail.com>.
On Wed, Mar 23, 2011 at 8:36 AM, Ryan Rawson <ry...@gmail.com> wrote:
> Curious, why do you mention "SequenceFile" and "TFile".  Neither of
> those are either in the hbase.io, and TFile is not used anywhere in
> HBase.

A cross-posting side-effect, I guess :-)

-- 
Harsh J
http://harshj.com

Re: File formats in Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Curious, why do you mention "SequenceFile" and "TFile".  Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.

-ryan

On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
>
> Thank you so much :)
>

Re: File formats in Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Curious, why do you mention "SequenceFile" and "TFile".  Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.

-ryan

On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
>
> Thank you so much :)
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
I found this interesting article about sequence file, share it here

http://www.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/

On Sun, Mar 20, 2011 at 6:04 AM, Niels Basjes <Ni...@basjes.nl> wrote:

> And then there is the matter of how you put the data in the file. I've
> heard that some people write the data as protocolbuffers into the
> sequence file.
>
> 2011/3/19 Harsh J <qw...@gmail.com>:
> > Hello,
> >
> > On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com>
> wrote:
> >> I am browsing through the hadoop.io package and was wondering what
> other
> >> file formats are available in hadoop other than SequenceFile and TFile?
> >
> > Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of
> > SequenceFiles, if you need maps/sets), and IFiles (Used by the
> > map-output buffers to produce a key-value file for Reducers to use,
> > internal use only).
> >
> > Apache Hive use RCFiles, which is very interesting too. Apache Avro
> > provides Avro-Datafiles that are designed for use with Hadoop
> > Map/Reduce + Avro-serialized data.
> >
> > I'm not sure of this one, but Pig probably was implementing a
> > table-file-like solution of their own a while ago. Howl?
> >
> > --
> > Harsh J
> > http://harshj.com
> >
>
>
>
> --
> Met vriendelijke groeten,
>
> Niels Basjes
>

Re: File formats in Hadoop

Posted by Niels Basjes <Ni...@basjes.nl>.
And then there is the matter of how you put the data in the file. I've
heard that some people write the data as protocolbuffers into the
sequence file.

2011/3/19 Harsh J <qw...@gmail.com>:
> Hello,
>
> On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com> wrote:
>> I am browsing through the hadoop.io package and was wondering what other
>> file formats are available in hadoop other than SequenceFile and TFile?
>
> Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of
> SequenceFiles, if you need maps/sets), and IFiles (Used by the
> map-output buffers to produce a key-value file for Reducers to use,
> internal use only).
>
> Apache Hive use RCFiles, which is very interesting too. Apache Avro
> provides Avro-Datafiles that are designed for use with Hadoop
> Map/Reduce + Avro-serialized data.
>
> I'm not sure of this one, but Pig probably was implementing a
> table-file-like solution of their own a while ago. Howl?
>
> --
> Harsh J
> http://harshj.com
>



-- 
Met vriendelijke groeten,

Niels Basjes

Re: File formats in Hadoop

Posted by Harsh J <qw...@gmail.com>.
Hello,

On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?

Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of
SequenceFiles, if you need maps/sets), and IFiles (Used by the
map-output buffers to produce a key-value file for Reducers to use,
internal use only).

Apache Hive use RCFiles, which is very interesting too. Apache Avro
provides Avro-Datafiles that are designed for use with Hadoop
Map/Reduce + Avro-serialized data.

I'm not sure of this one, but Pig probably was implementing a
table-file-like solution of their own a while ago. Howl?

-- 
Harsh J
http://harshj.com

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
Thank you for the info, HFile looks interesting, can't wait to dig into the
code and get a better understanding of HFile !

On Sat, Mar 19, 2011 at 11:28 AM, Harsh J <qw...@gmail.com> wrote:

> Hello,
>
> On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com>
> wrote:
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
>
> HBase provides its own format called HFile. See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/HFile.html
> for details!
>
> --
> Harsh J
> http://harshj.com
>

Re: File formats in Hadoop

Posted by Harsh J <qw...@gmail.com>.
Hello,

On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <we...@gmail.com> wrote:
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.

HBase provides its own format called HFile. See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/HFile.html
for details!

-- 
Harsh J
http://harshj.com

Fwd: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
---------- Forwarded message ----------
From: Weishung Chung <we...@gmail.com>
Date: Tue, Mar 22, 2011 at 11:31 AM
Subject: Re: File formats in Hadoop
To: Vivek Krishna <vi...@gmail.com>
Cc: user@hbase.apache.org, common-user@hadoop.apache.org,
qwertymaniac@gmail.com, Doug Cutting <cu...@apache.org>


I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html


<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions

key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 :  rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3:  rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:

> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
I found this useful article that explains the internal storage of HFile

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
<http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html>

On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung <we...@gmail.com> wrote:

> I also found this informative article
>
> http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
>
>
>
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
> the key value pair be
> eg column family1 with one qualifier 1 with 2 versions
>
> key1 : rowkey1+column family1:qualifier1+timestamp1
> value1: corresponding cell value1
> key2 :  rowkey1+column family1:qualifier1+timestamp2
> value2: corresponding cell value 2
> key3:  rowkey2+column family1:qualifier1+timestamp1
> value3: corresponding cell value 3
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
> On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
>
>> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
>> might help.
>>
>> Viv
>>
>>
>>
>>
>> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>>
>>> My fellow superb hbase experts,
>>>
>>> Looking at the HFile specs and have some questions.
>>> How is a particular table cell in a HBase table being represented in the
>>> HFile? Does the key of the key value pair represent the rowkey+column
>>> family:qualifier+timestamp and the value represent the corresponding cell
>>> value? If so, to read a row, multiple key/value pair reads have to be
>>> done?
>>>
>>> Thank you :)
>>>
>>>
>>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>>> wrote:
>>>
>>> > Thank you, I will definitely take a look. Also, the TFile spec below
>>> helps
>>> > me to understand more,
>>> > what an exciting work !
>>> >
>>> >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > <
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>>> >> > I am browsing through the hadoop.io package and was wondering what
>>> >> other
>>> >> > file formats are available in hadoop other than SequenceFile and
>>> TFile?
>>> >> > Is all data written through hadoop including those from hbase saved
>>> in
>>> >> the
>>> >> > above formats? It seems like SequenceFile is in key value pair
>>> format.
>>> >>
>>> >> Avro includes a file format that works with Hadoop.
>>> >>
>>> >>
>>> >>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>> >>
>>> >> Doug
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
I found this useful article that explains the internal storage of HFile

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
<http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html>

On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung <we...@gmail.com> wrote:

> I also found this informative article
>
> http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
>
>
>
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
> the key value pair be
> eg column family1 with one qualifier 1 with 2 versions
>
> key1 : rowkey1+column family1:qualifier1+timestamp1
> value1: corresponding cell value1
> key2 :  rowkey1+column family1:qualifier1+timestamp2
> value2: corresponding cell value 2
> key3:  rowkey2+column family1:qualifier1+timestamp1
> value3: corresponding cell value 3
> <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
> On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:
>
>> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
>> might help.
>>
>> Viv
>>
>>
>>
>>
>> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>>
>>> My fellow superb hbase experts,
>>>
>>> Looking at the HFile specs and have some questions.
>>> How is a particular table cell in a HBase table being represented in the
>>> HFile? Does the key of the key value pair represent the rowkey+column
>>> family:qualifier+timestamp and the value represent the corresponding cell
>>> value? If so, to read a row, multiple key/value pair reads have to be
>>> done?
>>>
>>> Thank you :)
>>>
>>>
>>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>>> wrote:
>>>
>>> > Thank you, I will definitely take a look. Also, the TFile spec below
>>> helps
>>> > me to understand more,
>>> > what an exciting work !
>>> >
>>> >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > <
>>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>>> >
>>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>>> >> > I am browsing through the hadoop.io package and was wondering what
>>> >> other
>>> >> > file formats are available in hadoop other than SequenceFile and
>>> TFile?
>>> >> > Is all data written through hadoop including those from hbase saved
>>> in
>>> >> the
>>> >> > above formats? It seems like SequenceFile is in key value pair
>>> format.
>>> >>
>>> >> Avro includes a file format that works with Hadoop.
>>> >>
>>> >>
>>> >>
>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>> >>
>>> >> Doug
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html


<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions

key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 :  rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3:  rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:

> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html


<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions

key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 :  rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3:  rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <vi...@gmail.com>wrote:

> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>

Re: File formats in Hadoop

Posted by Vivek Krishna <vi...@gmail.com>.
http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
might help.

Viv



On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com> wrote:

> My fellow superb hbase experts,
>
> Looking at the HFile specs and have some questions.
> How is a particular table cell in a HBase table being represented in the
> HFile? Does the key of the key value pair represent the rowkey+column
> family:qualifier+timestamp and the value represent the corresponding cell
> value? If so, to read a row, multiple key/value pair reads have to be done?
>
> Thank you :)
>
>
> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Thank you, I will definitely take a look. Also, the TFile spec below
> helps
> > me to understand more,
> > what an exciting work !
> >
> >
> >
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > <
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
> >> > I am browsing through the hadoop.io package and was wondering what
> >> other
> >> > file formats are available in hadoop other than SequenceFile and
> TFile?
> >> > Is all data written through hadoop including those from hbase saved in
> >> the
> >> > above formats? It seems like SequenceFile is in key value pair format.
> >>
> >> Avro includes a file format that works with Hadoop.
> >>
> >>
> >>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> >>
> >> Doug
> >>
> >
> >
>

Re: File formats in Hadoop

Posted by Vivek Krishna <vi...@gmail.com>.
http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
might help.

Viv



On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <we...@gmail.com> wrote:

> My fellow superb hbase experts,
>
> Looking at the HFile specs and have some questions.
> How is a particular table cell in a HBase table being represented in the
> HFile? Does the key of the key value pair represent the rowkey+column
> family:qualifier+timestamp and the value represent the corresponding cell
> value? If so, to read a row, multiple key/value pair reads have to be done?
>
> Thank you :)
>
>
> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Thank you, I will definitely take a look. Also, the TFile spec below
> helps
> > me to understand more,
> > what an exciting work !
> >
> >
> >
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > <
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
> >
> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
> >> > I am browsing through the hadoop.io package and was wondering what
> >> other
> >> > file formats are available in hadoop other than SequenceFile and
> TFile?
> >> > Is all data written through hadoop including those from hbase saved in
> >> the
> >> > above formats? It seems like SequenceFile is in key value pair format.
> >>
> >> Avro includes a file format that works with Hadoop.
> >>
> >>
> >>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> >>
> >> Doug
> >>
> >
> >
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
My fellow superb hbase experts,

Looking at the HFile specs and have some questions.
How is a particular table cell in a HBase table being represented in the
HFile? Does the key of the key value pair represent the rowkey+column
family:qualifier+timestamp and the value represent the corresponding cell
value? If so, to read a row, multiple key/value pair reads have to be done?

Thank you :)


On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com> wrote:

> Thank you, I will definitely take a look. Also, the TFile spec below helps
> me to understand more,
> what an exciting work !
>
>
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>
> <https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf>
> On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org> wrote:
>
>> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> > I am browsing through the hadoop.io package and was wondering what
>> other
>> > file formats are available in hadoop other than SequenceFile and TFile?
>> > Is all data written through hadoop including those from hbase saved in
>> the
>> > above formats? It seems like SequenceFile is in key value pair format.
>>
>> Avro includes a file format that works with Hadoop.
>>
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>
>> Doug
>>
>
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
My fellow superb hbase experts,

Looking at the HFile specs and have some questions.
How is a particular table cell in a HBase table being represented in the
HFile? Does the key of the key value pair represent the rowkey+column
family:qualifier+timestamp and the value represent the corresponding cell
value? If so, to read a row, multiple key/value pair reads have to be done?

Thank you :)


On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <we...@gmail.com> wrote:

> Thank you, I will definitely take a look. Also, the TFile spec below helps
> me to understand more,
> what an exciting work !
>
>
> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>
> <https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf>
> On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org> wrote:
>
>> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> > I am browsing through the hadoop.io package and was wondering what
>> other
>> > file formats are available in hadoop other than SequenceFile and TFile?
>> > Is all data written through hadoop including those from hbase saved in
>> the
>> > above formats? It seems like SequenceFile is in key value pair format.
>>
>> Avro includes a file format that works with Hadoop.
>>
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>
>> Doug
>>
>
>

Re: File formats in Hadoop

Posted by Weishung Chung <we...@gmail.com>.
Thank you, I will definitely take a look. Also, the TFile spec below helps
me to understand more,
what an exciting work !

https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
<https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf>
On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <cu...@apache.org> wrote:

> On 03/19/2011 09:01 AM, Weishung Chung wrote:
> > I am browsing through the hadoop.io package and was wondering what other
> > file formats are available in hadoop other than SequenceFile and TFile?
> > Is all data written through hadoop including those from hbase saved in
> the
> > above formats? It seems like SequenceFile is in key value pair format.
>
> Avro includes a file format that works with Hadoop.
>
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>
> Doug
>

Re: File formats in Hadoop

Posted by Doug Cutting <cu...@apache.org>.
On 03/19/2011 09:01 AM, Weishung Chung wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.

Avro includes a file format that works with Hadoop.

http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html

Doug

Re: File formats in Hadoop

Posted by Ryan Rawson <ry...@gmail.com>.
Curious, why do you mention "SequenceFile" and "TFile".  Neither of
those are either in the hbase.io, and TFile is not used anywhere in
HBase.

-ryan

On Sat, Mar 19, 2011 at 9:01 AM, Weishung Chung <we...@gmail.com> wrote:
> I am browsing through the hadoop.io package and was wondering what other
> file formats are available in hadoop other than SequenceFile and TFile?
> Is all data written through hadoop including those from hbase saved in the
> above formats? It seems like SequenceFile is in key value pair format.
>
> Thank you so much :)
>