You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Igor Gatis <ig...@gmail.com> on 2013/12/06 05:35:27 UTC

How to convert SequenceFile into HFile?

I have SequenceFiles I'd like to convert to HFile. How do I that?

Re: How to convert SequenceFile into HFile?

Posted by Igor Gatis <ig...@gmail.com>.
Hi JM,

My usage is the following: I want to write a C++ program which will answer
RPC requests. Each request has a list of keys and responses will contain
values. I want to use HFile because it has an efficient key-based index and
because there is a whole set of tools in hadoop to produce this kind of
file.

So, my usage is totally unrelated to HBase. I only have keys and values.
Family and qualifier makes no sense in my design -- specifying empty values
for those is a waste space in my case.

TFile is a replacement for Hadoop's
MapFile<https://issues.apache.org/jira/browse/HADOOP-3315>.
HFile was designed after TFile.

Sounds like TFile better fits my use case then.

-Gatis



On Fri, Dec 6, 2013 at 7:54 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Igor,
>
>
> Have you looked at this constructor?
>
>   /**
>    * Constructs KeyValue structure filled with null value.
>    * @param row - row key (arbitrary byte array)
>    * @param family family name
>    * @param qualifier column qualifier
>    */
>   public KeyValue(final byte [] row, final byte [] family,
>       final byte [] qualifier, final byte [] value)
>
> You need to specify the column family and the column qualifier. That's in
> your table definition. And then you give your value.
>
> Is that not what you are looking for? Also, what is a TFile?
>
> JM
>
>
> 2013/12/6 Igor Gatis <ig...@gmail.com>
>
> > Sounds like hbase's HFileOutputFormat depends on KeyValue's "family"
> field.
> > I don't want that.
> >
> > All I want is to keep keys and values in an indexed filed. TFile would
> work
> > as well. But it seems there is no TFileOutputFormat available.
> >
> >
> > On Fri, Dec 6, 2013 at 4:47 PM, Igor Gatis <ig...@gmail.com> wrote:
> >
> > > That's the kind of solution I'm looking for.
> > >
> > > Here is what I have:
> > >
> > >     String jobName = "Seq2HFile";
> > >     Job job = new Job(getConf(), jobName);
> > >     job.setJarByClass(Seq2HFile.class);
> > >
> > >     job.setMapperClass(*MyIdentityMapper.class*);
> > >     job.setMapOutputKeyClass(BytesWritable.class);
> > >     job.setMapOutputValueClass(BytesWritable.class);
> > >
> > >     job.setPartitionerClass(TotalOrderPartitioner.class);
> > >
> > >     job.setReducerClass(KeyValueSortReducer.class);
> > >     job.setOutputKeyClass(ImmutableBytesWritable.class);
> > >     job.setOutputValueClass(KeyValue.class);
> > >     job.setNumReduceTasks(1);
> > >
> > >     job.setInputFormatClass(SequenceFileInputFormat.class);
> > >     SequenceFileInputFormat.addInputPaths(job, inputPath);
> > >
> > >     job.setOutputFormatClass(HFileOutputFormat.class);
> > >     HFileOutputFormat.setOutputPath(job, new Path(outputPath));
> > >
> > >     job.submit();
> > >     job.waitForCompletion(true);
> > >
> > > The bit I'm stuck is MyIdentityMapper. My input is a
> > > SequenceFile<BytesWritable, BytesWritable>. According to
> > HFileOutputFormat
> > > signature, output key is ImmutableBytesWritable and value is KeyValue.
> > >
> > > I guess BytesWritable -> ImmutableBytesWritable is straightforward. But
> > > I've got no clue how to fill KeyValue.
> > >
> > >   public static class MyIdentityMapper
> > >       extends Mapper<BytesWritable, BytesWritable,
> > ImmutableBytesWritable,
> > > KeyValue> {
> > >     public void map(BytesWritable key, BytesWritable value, Context
> > > context) throws IOException,
> > >         InterruptedException {
> > > *      // What do I write here?*
> > >     }
> > >   }
> > >
> > >
> > >
> > > On Fri, Dec 6, 2013 at 12:31 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > >> Hi Igor,
> > >>
> > >> I will say, MapReduce.
> > >>
> > >> SequenceFileInputFormat
> > >> HFileOutputFormat
> > >>
> > >> JM
> > >>
> > >>
> > >> 2013/12/5 Igor Gatis <ig...@gmail.com>
> > >>
> > >> > I have SequenceFiles I'd like to convert to HFile. How do I that?
> > >> >
> > >>
> > >
> > >
> >
>

Re: How to convert SequenceFile into HFile?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Igor,


Have you looked at this constructor?

  /**
   * Constructs KeyValue structure filled with null value.
   * @param row - row key (arbitrary byte array)
   * @param family family name
   * @param qualifier column qualifier
   */
  public KeyValue(final byte [] row, final byte [] family,
      final byte [] qualifier, final byte [] value)

You need to specify the column family and the column qualifier. That's in
your table definition. And then you give your value.

Is that not what you are looking for? Also, what is a TFile?

JM


2013/12/6 Igor Gatis <ig...@gmail.com>

> Sounds like hbase's HFileOutputFormat depends on KeyValue's "family" field.
> I don't want that.
>
> All I want is to keep keys and values in an indexed filed. TFile would work
> as well. But it seems there is no TFileOutputFormat available.
>
>
> On Fri, Dec 6, 2013 at 4:47 PM, Igor Gatis <ig...@gmail.com> wrote:
>
> > That's the kind of solution I'm looking for.
> >
> > Here is what I have:
> >
> >     String jobName = "Seq2HFile";
> >     Job job = new Job(getConf(), jobName);
> >     job.setJarByClass(Seq2HFile.class);
> >
> >     job.setMapperClass(*MyIdentityMapper.class*);
> >     job.setMapOutputKeyClass(BytesWritable.class);
> >     job.setMapOutputValueClass(BytesWritable.class);
> >
> >     job.setPartitionerClass(TotalOrderPartitioner.class);
> >
> >     job.setReducerClass(KeyValueSortReducer.class);
> >     job.setOutputKeyClass(ImmutableBytesWritable.class);
> >     job.setOutputValueClass(KeyValue.class);
> >     job.setNumReduceTasks(1);
> >
> >     job.setInputFormatClass(SequenceFileInputFormat.class);
> >     SequenceFileInputFormat.addInputPaths(job, inputPath);
> >
> >     job.setOutputFormatClass(HFileOutputFormat.class);
> >     HFileOutputFormat.setOutputPath(job, new Path(outputPath));
> >
> >     job.submit();
> >     job.waitForCompletion(true);
> >
> > The bit I'm stuck is MyIdentityMapper. My input is a
> > SequenceFile<BytesWritable, BytesWritable>. According to
> HFileOutputFormat
> > signature, output key is ImmutableBytesWritable and value is KeyValue.
> >
> > I guess BytesWritable -> ImmutableBytesWritable is straightforward. But
> > I've got no clue how to fill KeyValue.
> >
> >   public static class MyIdentityMapper
> >       extends Mapper<BytesWritable, BytesWritable,
> ImmutableBytesWritable,
> > KeyValue> {
> >     public void map(BytesWritable key, BytesWritable value, Context
> > context) throws IOException,
> >         InterruptedException {
> > *      // What do I write here?*
> >     }
> >   }
> >
> >
> >
> > On Fri, Dec 6, 2013 at 12:31 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi Igor,
> >>
> >> I will say, MapReduce.
> >>
> >> SequenceFileInputFormat
> >> HFileOutputFormat
> >>
> >> JM
> >>
> >>
> >> 2013/12/5 Igor Gatis <ig...@gmail.com>
> >>
> >> > I have SequenceFiles I'd like to convert to HFile. How do I that?
> >> >
> >>
> >
> >
>

Re: How to convert SequenceFile into HFile?

Posted by Igor Gatis <ig...@gmail.com>.
Sounds like hbase's HFileOutputFormat depends on KeyValue's "family" field.
I don't want that.

All I want is to keep keys and values in an indexed filed. TFile would work
as well. But it seems there is no TFileOutputFormat available.


On Fri, Dec 6, 2013 at 4:47 PM, Igor Gatis <ig...@gmail.com> wrote:

> That's the kind of solution I'm looking for.
>
> Here is what I have:
>
>     String jobName = "Seq2HFile";
>     Job job = new Job(getConf(), jobName);
>     job.setJarByClass(Seq2HFile.class);
>
>     job.setMapperClass(*MyIdentityMapper.class*);
>     job.setMapOutputKeyClass(BytesWritable.class);
>     job.setMapOutputValueClass(BytesWritable.class);
>
>     job.setPartitionerClass(TotalOrderPartitioner.class);
>
>     job.setReducerClass(KeyValueSortReducer.class);
>     job.setOutputKeyClass(ImmutableBytesWritable.class);
>     job.setOutputValueClass(KeyValue.class);
>     job.setNumReduceTasks(1);
>
>     job.setInputFormatClass(SequenceFileInputFormat.class);
>     SequenceFileInputFormat.addInputPaths(job, inputPath);
>
>     job.setOutputFormatClass(HFileOutputFormat.class);
>     HFileOutputFormat.setOutputPath(job, new Path(outputPath));
>
>     job.submit();
>     job.waitForCompletion(true);
>
> The bit I'm stuck is MyIdentityMapper. My input is a
> SequenceFile<BytesWritable, BytesWritable>. According to HFileOutputFormat
> signature, output key is ImmutableBytesWritable and value is KeyValue.
>
> I guess BytesWritable -> ImmutableBytesWritable is straightforward. But
> I've got no clue how to fill KeyValue.
>
>   public static class MyIdentityMapper
>       extends Mapper<BytesWritable, BytesWritable, ImmutableBytesWritable,
> KeyValue> {
>     public void map(BytesWritable key, BytesWritable value, Context
> context) throws IOException,
>         InterruptedException {
> *      // What do I write here?*
>     }
>   }
>
>
>
> On Fri, Dec 6, 2013 at 12:31 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi Igor,
>>
>> I will say, MapReduce.
>>
>> SequenceFileInputFormat
>> HFileOutputFormat
>>
>> JM
>>
>>
>> 2013/12/5 Igor Gatis <ig...@gmail.com>
>>
>> > I have SequenceFiles I'd like to convert to HFile. How do I that?
>> >
>>
>
>

Re: How to convert SequenceFile into HFile?

Posted by Igor Gatis <ig...@gmail.com>.
That's the kind of solution I'm looking for.

Here is what I have:

    String jobName = "Seq2HFile";
    Job job = new Job(getConf(), jobName);
    job.setJarByClass(Seq2HFile.class);

    job.setMapperClass(*MyIdentityMapper.class*);
    job.setMapOutputKeyClass(BytesWritable.class);
    job.setMapOutputValueClass(BytesWritable.class);

    job.setPartitionerClass(TotalOrderPartitioner.class);

    job.setReducerClass(KeyValueSortReducer.class);
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(KeyValue.class);
    job.setNumReduceTasks(1);

    job.setInputFormatClass(SequenceFileInputFormat.class);
    SequenceFileInputFormat.addInputPaths(job, inputPath);

    job.setOutputFormatClass(HFileOutputFormat.class);
    HFileOutputFormat.setOutputPath(job, new Path(outputPath));

    job.submit();
    job.waitForCompletion(true);

The bit I'm stuck is MyIdentityMapper. My input is a
SequenceFile<BytesWritable, BytesWritable>. According to HFileOutputFormat
signature, output key is ImmutableBytesWritable and value is KeyValue.

I guess BytesWritable -> ImmutableBytesWritable is straightforward. But
I've got no clue how to fill KeyValue.

  public static class MyIdentityMapper
      extends Mapper<BytesWritable, BytesWritable, ImmutableBytesWritable,
KeyValue> {
    public void map(BytesWritable key, BytesWritable value, Context
context) throws IOException,
        InterruptedException {
*      // What do I write here?*
    }
  }



On Fri, Dec 6, 2013 at 12:31 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Igor,
>
> I will say, MapReduce.
>
> SequenceFileInputFormat
> HFileOutputFormat
>
> JM
>
>
> 2013/12/5 Igor Gatis <ig...@gmail.com>
>
> > I have SequenceFiles I'd like to convert to HFile. How do I that?
> >
>

Re: How to convert SequenceFile into HFile?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Igor,

I will say, MapReduce.

SequenceFileInputFormat
HFileOutputFormat

JM


2013/12/5 Igor Gatis <ig...@gmail.com>

> I have SequenceFiles I'd like to convert to HFile. How do I that?
>