You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mapred Learn <ma...@gmail.com> on 2011/02/17 22:16:23 UTC

Sequence File usage queries

Hi,
I have a use case to upload some tera-bytes of text files as sequences files
on HDFS.

These text files have several layouts ranging from 32 to 62 columns
(metadata).

What would be a good way to upload these files along with their metadata:

i) creating a key, value class per text file layout and use it to create and
upload as sequence files ?

ii) create SequenceFile.Metadata header in each file being uploaded as
sequence file individually ?

Any inputs are appreciated !

Thanks
-JJ

Re: Sequence File usage queries

Posted by Ted Yu <yu...@gmail.com>.
I didn't find SequenceFile metadata viewer.
You need to write some code for #2 below.

On Wed, Feb 23, 2011 at 4:24 PM, Mapred Learn <ma...@gmail.com>wrote:

> Thanks !
>
> In this case, how can we print the metadata associated with the data
> (sequence files), if user accessing this data wants to know it:
> i) Is there any hadoop command that can do it ?
> ii) Or we will have to provide some interface to the user to see the
> metadata ?
>
> -JJ
>
> On Sat, Feb 19, 2011 at 9:17 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Option 2 is better.
>> Please see this in SequenceFile:
>>   public static Writer
>>     createWriter(FileSystem fs, Configuration conf, Path name,
>>                  Class keyClass, Class valClass, int bufferSize,
>>                  short replication, long blockSize,
>>                  CompressionType compressionType, CompressionCodec codec,
>>                  Progressable progress, Metadata metadata) throws
>> IOException {
>>
>>
>>
>> On Thu, Feb 17, 2011 at 1:16 PM, Mapred Learn <ma...@gmail.com>wrote:
>>
>>> Hi,
>>> I have a use case to upload some tera-bytes of text files as sequences
>>> files on HDFS.
>>>
>>> These text files have several layouts ranging from 32 to 62 columns
>>> (metadata).
>>>
>>> What would be a good way to upload these files along with their metadata:
>>>
>>> i) creating a key, value class per text file layout and use it to create
>>> and upload as sequence files ?
>>>
>>> ii) create SequenceFile.Metadata header in each file being uploaded as
>>> sequence file individually ?
>>>
>>> Any inputs are appreciated !
>>>
>>> Thanks
>>> -JJ
>>>
>>
>>
>

Re: Sequence File usage queries

Posted by David Rosenstrauch <da...@darose.net>.
On 02/23/2011 07:24 PM, Mapred Learn wrote:
> Thanks !
>
> In this case, how can we print the metadata associated with the data
> (sequence files), if user accessing this data wants to know it:
> i) Is there any hadoop command that can do it ?
> ii) Or we will have to provide some interface to the user to see the
> metadata ?
>
> -JJ

We wrote our own sequence file dumper app.

DR

Re: Sequence File usage queries

Posted by Mapred Learn <ma...@gmail.com>.
Thanks !

In this case, how can we print the metadata associated with the data
(sequence files), if user accessing this data wants to know it:
i) Is there any hadoop command that can do it ?
ii) Or we will have to provide some interface to the user to see the
metadata ?

-JJ

On Sat, Feb 19, 2011 at 9:17 AM, Ted Yu <yu...@gmail.com> wrote:

> Option 2 is better.
> Please see this in SequenceFile:
>   public static Writer
>     createWriter(FileSystem fs, Configuration conf, Path name,
>                  Class keyClass, Class valClass, int bufferSize,
>                  short replication, long blockSize,
>                  CompressionType compressionType, CompressionCodec codec,
>                  Progressable progress, Metadata metadata) throws
> IOException {
>
>
>
> On Thu, Feb 17, 2011 at 1:16 PM, Mapred Learn <ma...@gmail.com>wrote:
>
>> Hi,
>> I have a use case to upload some tera-bytes of text files as sequences
>> files on HDFS.
>>
>> These text files have several layouts ranging from 32 to 62 columns
>> (metadata).
>>
>> What would be a good way to upload these files along with their metadata:
>>
>> i) creating a key, value class per text file layout and use it to create
>> and upload as sequence files ?
>>
>> ii) create SequenceFile.Metadata header in each file being uploaded as
>> sequence file individually ?
>>
>> Any inputs are appreciated !
>>
>> Thanks
>> -JJ
>>
>
>

Re: Sequence File usage queries

Posted by Ted Yu <yu...@gmail.com>.
Option 2 is better.
Please see this in SequenceFile:
  public static Writer
    createWriter(FileSystem fs, Configuration conf, Path name,
                 Class keyClass, Class valClass, int bufferSize,
                 short replication, long blockSize,
                 CompressionType compressionType, CompressionCodec codec,
                 Progressable progress, Metadata metadata) throws
IOException {


On Thu, Feb 17, 2011 at 1:16 PM, Mapred Learn <ma...@gmail.com>wrote:

> Hi,
> I have a use case to upload some tera-bytes of text files as sequences
> files on HDFS.
>
> These text files have several layouts ranging from 32 to 62 columns
> (metadata).
>
> What would be a good way to upload these files along with their metadata:
>
> i) creating a key, value class per text file layout and use it to create
> and upload as sequence files ?
>
> ii) create SequenceFile.Metadata header in each file being uploaded as
> sequence file individually ?
>
> Any inputs are appreciated !
>
> Thanks
> -JJ
>