You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rita <rm...@gmail.com> on 2011/09/15 13:56:54 UTC

schema doubt

I have many small files (close to 1 million) and I was thinking of creating
a key value pair for them. The file name can be the key and the content can
be value.

Would it be better if I do a base64 on the content and load it to hbase or
try to echo the content for hbase shell?

Has anyone done something similar to this?



-- 
--- Get your facts first, then you can distort them as you please.--

Re: schema doubt

Posted by Rita <rm...@gmail.com>.
Each file is about 12k to 6k.

Inserting wont be an issue just the access. I would like to access them
quickly.

Not sure what the proper key should be. The file name is ok, but just
wondering if there is anything more I can be doing to leverage hbase.


On Thu, Sep 15, 2011 at 9:24 AM, Akash Ashok <th...@gmail.com> wrote:

> Also could you tell how small these files are ? If they are way less than
> 64MB default HDFS block size you'd want to splice them before running a
> MapReduce.
>
> Cheers,
> Akash A
>
> On Thu, Sep 15, 2011 at 6:02 PM, Joey Echeverria <jo...@cloudera.com>
> wrote:
>
> > It sounds lik you're planning to use the HBase shell to insert all of
> > this data. If that's correct, I'd recommend against it. I would write
> > a simple MapReduce program to insert the data instead. You could run a
> > map-only job that reads in the files and writes each one as a row in
> > HBase. WIth the java APIs you can write the raw bytes pretty easily.
> >
> > -Joey
> >
> > On Thu, Sep 15, 2011 at 7:56 AM, Rita <rm...@gmail.com> wrote:
> > > I have many small files (close to 1 million) and I was thinking of
> > creating
> > > a key value pair for them. The file name can be the key and the content
> > can
> > > be value.
> > >
> > > Would it be better if I do a base64 on the content and load it to hbase
> > or
> > > try to echo the content for hbase shell?
> > >
> > > Has anyone done something similar to this?
> > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> > >
> >
> >
> >
> > --
> > Joseph Echeverria
> > Cloudera, Inc.
> > 443.305.9434
> >
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: schema doubt

Posted by Akash Ashok <th...@gmail.com>.
Also could you tell how small these files are ? If they are way less than
64MB default HDFS block size you'd want to splice them before running a
MapReduce.

Cheers,
Akash A

On Thu, Sep 15, 2011 at 6:02 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> It sounds lik you're planning to use the HBase shell to insert all of
> this data. If that's correct, I'd recommend against it. I would write
> a simple MapReduce program to insert the data instead. You could run a
> map-only job that reads in the files and writes each one as a row in
> HBase. WIth the java APIs you can write the raw bytes pretty easily.
>
> -Joey
>
> On Thu, Sep 15, 2011 at 7:56 AM, Rita <rm...@gmail.com> wrote:
> > I have many small files (close to 1 million) and I was thinking of
> creating
> > a key value pair for them. The file name can be the key and the content
> can
> > be value.
> >
> > Would it be better if I do a base64 on the content and load it to hbase
> or
> > try to echo the content for hbase shell?
> >
> > Has anyone done something similar to this?
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: schema doubt

Posted by Joey Echeverria <jo...@cloudera.com>.
It sounds lik you're planning to use the HBase shell to insert all of
this data. If that's correct, I'd recommend against it. I would write
a simple MapReduce program to insert the data instead. You could run a
map-only job that reads in the files and writes each one as a row in
HBase. WIth the java APIs you can write the raw bytes pretty easily.

-Joey

On Thu, Sep 15, 2011 at 7:56 AM, Rita <rm...@gmail.com> wrote:
> I have many small files (close to 1 million) and I was thinking of creating
> a key value pair for them. The file name can be the key and the content can
> be value.
>
> Would it be better if I do a base64 on the content and load it to hbase or
> try to echo the content for hbase shell?
>
> Has anyone done something similar to this?
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434