You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ashish Shinde <as...@strandls.com> on 2011/01/24 06:38:24 UTC

Bulk upload an LZO compression

Hi,

I have been importing data to hbase 0.90.0 using the code from the bulk
uploader (ImportTsv.java). The table has LZO compression set, however
unless major compaction is run the table it does not get compressed. 

Is there a way to compress the table as the bulk uploader creates the
HFile. This is important for us because we don't want to have a burst
increase in our disk usage.

Thanks and regards,
 - Ashish

Re: Bulk upload an LZO compression

Posted by Ashish Shinde <as...@strandls.com>.
Added the patch to https://issues.apache.org/jira/browse/HBASE-3474

Not sure if this is the best implementation though. The  
packing / unpacking of compression algorithm names to a single
configuration item looks hacky. 

Should Configuration class support querying by key patterns? In that
case one could potentially have one config item per column famliy

f1.hfile.compression, f2.hfile.compression etc. 

And retrieve all of them with *.hfile.compression.

Thanks and regards,
 - Ashish

On Wed, 26 Jan 2011 22:39:01 -0800
Stack <st...@duboce.net> wrote:

> On Wed, Jan 26, 2011 at 9:07 PM, Ashish Shinde <as...@strandls.com>
> wrote:
> > Any chances the multi column family patch will make it to the trunk
> > soon?
> >
> 
> Multifamily is already committed to TRUNK.  Will appear in 0.92 HBase.
> St.Ack


Re: Bulk upload an LZO compression

Posted by Stack <st...@duboce.net>.
On Wed, Jan 26, 2011 at 9:07 PM, Ashish Shinde <as...@strandls.com> wrote:
> Any chances the multi column family patch will make it to the trunk
> soon?
>

Multifamily is already committed to TRUNK.  Will appear in 0.92 HBase.
St.Ack

Re: Bulk upload an LZO compression

Posted by Ashish Shinde <as...@strandls.com>.
Hi Todd,

Thanks. The problem was I used the patch for multi column family bulk
upload first and then added the LZO modifications. The
code in the trunk for creating writers is different from this so my code
changes will not be equivalent for the trunk and the mutli column
family patch.

Any chances the multi column family patch will make it to the trunk
soon?

Anyways will generate a patch against the trunk as well and attach it
to the ticket.

Thanks and regards,
 - Ashish

On Wed, 26 Jan 2011 19:50:06 -0800
Todd Lipcon <to...@cloudera.com> wrote:

> On Wed, Jan 26, 2011 at 2:43 AM, Ashish Shinde <as...@strandls.com>
> wrote:
> 
> > Hi,
> >
> > I am using 0.90.0 candidate_3 from
> > http://people.apache.org/~stack/hbase-0.90.0-candidate-3/
> >
> > and have patched it to run multi column family bulk upload from
> > ticket
> >
> > https://issues.apache.org/jira/browse/HBASE-1861
> >
> > Also created the ticket
> > https://issues.apache.org/jira/browse/HBASE-3474
> > to handle this.
> >
> > I modified the code but not sure how to generate a patch. The multi
> > column family code looks to be on 0.92.0 branch which I
> > can't find in hbase svn repo. how do I create the patch.
> >
> 
> 0.92 is just "trunk" in the svn repo - so you can generate your patch
> against that.
> 
> Or, if we want we can consider this a bug fix / compatible
> improvement, and we can apply it to both 0.90 (for 0.90.1) and trunk.
> But, we'll need a patch for trunk as well.
> 
> -Todd
> 
>  On Tue, 25
> > Jan 2011 10:00:00 +0530 Ashish Shinde <as...@strandls.com> wrote:
> >
> > > Hi,
> > >
> > > Yup after some digging I got to HFileOutputFormat and was
> > > relieved to know that it does support compression. Was able to
> > > add code to set compression based on the column family's
> > > compression setting.
> > >
> > > Will create a ticket and submit the patch after some more testing
> > > and going over the coding guidelines. My code looks a little
> > > hacky because I am passing the family specific compression
> > > algorithm name as "," delimited single configuration item. I
> > > figure that Configuration should have a method to return all key
> > > values where key's match a pattern. Maybe there are better ways
> > > to do this. Will get this into the ticket.
> > >
> > > Thanks and regards,
> > >  - Ashish
> > >
> > >  On Mon, 24 Jan 2011 11:12:06 -0800
> > > Todd Lipcon <to...@cloudera.com> wrote:
> > >
> > > > On Mon, Jan 24, 2011 at 9:50 AM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > In HFileOutputFormat it says this near top:
> > > > >
> > > > >    // Invented config.  Add to hbase-*.xml if other than
> > > > > default compression.
> > > > >    final String compression = conf.get("hfile.compression",
> > > > >      Compression.Algorithm.NONE.getName());
> > > > >
> > > > > You might try messing with this config?
> > > > >
> > > >
> > > > And would be great to file (and provide a patch for) a JIRA that
> > > > automatically sets this based on the HTableDescriptor when
> > > > you're loading into an existing table!
> > > >
> > > > -Todd
> > > >
> > > >
> > > > > On Sun, Jan 23, 2011 at 9:38 PM, Ashish Shinde
> > > > > <as...@strandls.com> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I have been importing data to hbase 0.90.0 using the code
> > > > > > from the bulk uploader (ImportTsv.java). The table has LZO
> > > > > > compression set, however unless major compaction is run the
> > > > > > table it does not get compressed.
> > > > > >
> > > > > > Is there a way to compress the table as the bulk uploader
> > > > > > creates the HFile. This is important for us because we don't
> > > > > > want to have a burst increase in our disk usage.
> > > > > >
> > > > > > Thanks and regards,
> > > > > >  - Ashish
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> >
> >
> 
> 


Re: Bulk upload an LZO compression

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Jan 26, 2011 at 2:43 AM, Ashish Shinde <as...@strandls.com> wrote:

> Hi,
>
> I am using 0.90.0 candidate_3 from
> http://people.apache.org/~stack/hbase-0.90.0-candidate-3/
>
> and have patched it to run multi column family bulk upload from ticket
>
> https://issues.apache.org/jira/browse/HBASE-1861
>
> Also created the ticket
> https://issues.apache.org/jira/browse/HBASE-3474
> to handle this.
>
> I modified the code but not sure how to generate a patch. The multi column
> family code looks to be on 0.92.0 branch which I
> can't find in hbase svn repo. how do I create the patch.
>

0.92 is just "trunk" in the svn repo - so you can generate your patch
against that.

Or, if we want we can consider this a bug fix / compatible improvement, and
we can apply it to both 0.90 (for 0.90.1) and trunk. But, we'll need a patch
for trunk as well.

-Todd

 On Tue, 25
> Jan 2011 10:00:00 +0530 Ashish Shinde <as...@strandls.com> wrote:
>
> > Hi,
> >
> > Yup after some digging I got to HFileOutputFormat and was relieved to
> > know that it does support compression. Was able to add code to set
> > compression based on the column family's compression setting.
> >
> > Will create a ticket and submit the patch after some more testing and
> > going over the coding guidelines. My code looks a little hacky because
> > I am passing the family specific compression algorithm name as ","
> > delimited single configuration item. I figure that Configuration
> > should have a method to return all key values where key's match a
> > pattern. Maybe there are better ways to do this. Will get this into
> > the ticket.
> >
> > Thanks and regards,
> >  - Ashish
> >
> >  On Mon, 24 Jan 2011 11:12:06 -0800
> > Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > On Mon, Jan 24, 2011 at 9:50 AM, Stack <st...@duboce.net> wrote:
> > >
> > > > In HFileOutputFormat it says this near top:
> > > >
> > > >    // Invented config.  Add to hbase-*.xml if other than default
> > > > compression.
> > > >    final String compression = conf.get("hfile.compression",
> > > >      Compression.Algorithm.NONE.getName());
> > > >
> > > > You might try messing with this config?
> > > >
> > >
> > > And would be great to file (and provide a patch for) a JIRA that
> > > automatically sets this based on the HTableDescriptor when you're
> > > loading into an existing table!
> > >
> > > -Todd
> > >
> > >
> > > > On Sun, Jan 23, 2011 at 9:38 PM, Ashish Shinde
> > > > <as...@strandls.com> wrote:
> > > > > Hi,
> > > > >
> > > > > I have been importing data to hbase 0.90.0 using the code from
> > > > > the bulk uploader (ImportTsv.java). The table has LZO
> > > > > compression set, however unless major compaction is run the
> > > > > table it does not get compressed.
> > > > >
> > > > > Is there a way to compress the table as the bulk uploader
> > > > > creates the HFile. This is important for us because we don't
> > > > > want to have a burst increase in our disk usage.
> > > > >
> > > > > Thanks and regards,
> > > > >  - Ashish
> > > > >
> > > >
> > >
> > >
> > >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Bulk upload an LZO compression

Posted by Ashish Shinde <as...@strandls.com>.
Hi,

I am using 0.90.0 candidate_3 from
http://people.apache.org/~stack/hbase-0.90.0-candidate-3/

and have patched it to run multi column family bulk upload from ticket

https://issues.apache.org/jira/browse/HBASE-1861

Also created the ticket 
https://issues.apache.org/jira/browse/HBASE-3474
to handle this. 

I modified the code but not sure how to generate a patch. The multi column family code looks to be on 0.92.0 branch which I
can't find in hbase svn repo. how do I create the patch.

Thanks and regards,
 - Ashish


 On Tue, 25
Jan 2011 10:00:00 +0530 Ashish Shinde <as...@strandls.com> wrote:

> Hi,
> 
> Yup after some digging I got to HFileOutputFormat and was relieved to
> know that it does support compression. Was able to add code to set
> compression based on the column family's compression setting. 
> 
> Will create a ticket and submit the patch after some more testing and
> going over the coding guidelines. My code looks a little hacky because
> I am passing the family specific compression algorithm name as ","
> delimited single configuration item. I figure that Configuration
> should have a method to return all key values where key's match a
> pattern. Maybe there are better ways to do this. Will get this into
> the ticket.
> 
> Thanks and regards,
>  - Ashish
> 
>  On Mon, 24 Jan 2011 11:12:06 -0800
> Todd Lipcon <to...@cloudera.com> wrote:
> 
> > On Mon, Jan 24, 2011 at 9:50 AM, Stack <st...@duboce.net> wrote:
> > 
> > > In HFileOutputFormat it says this near top:
> > >
> > >    // Invented config.  Add to hbase-*.xml if other than default
> > > compression.
> > >    final String compression = conf.get("hfile.compression",
> > >      Compression.Algorithm.NONE.getName());
> > >
> > > You might try messing with this config?
> > >
> > 
> > And would be great to file (and provide a patch for) a JIRA that
> > automatically sets this based on the HTableDescriptor when you're
> > loading into an existing table!
> > 
> > -Todd
> > 
> > 
> > > On Sun, Jan 23, 2011 at 9:38 PM, Ashish Shinde
> > > <as...@strandls.com> wrote:
> > > > Hi,
> > > >
> > > > I have been importing data to hbase 0.90.0 using the code from
> > > > the bulk uploader (ImportTsv.java). The table has LZO
> > > > compression set, however unless major compaction is run the
> > > > table it does not get compressed.
> > > >
> > > > Is there a way to compress the table as the bulk uploader
> > > > creates the HFile. This is important for us because we don't
> > > > want to have a burst increase in our disk usage.
> > > >
> > > > Thanks and regards,
> > > >  - Ashish
> > > >
> > >
> > 
> > 
> > 


Re: Bulk upload an LZO compression

Posted by Ashish Shinde <as...@strandls.com>.
Hi,

Yup after some digging I got to HFileOutputFormat and was relieved to
know that it does support compression. Was able to add code to set
compression based on the column family's compression setting. 

Will create a ticket and submit the patch after some more testing and
going over the coding guidelines. My code looks a little hacky because
I am passing the family specific compression algorithm name as ","
delimited single configuration item. I figure that Configuration should
have a method to return all key values where key's match a pattern.
Maybe there are better ways to do this. Will get this into the ticket.

Thanks and regards,
 - Ashish

 On Mon, 24 Jan 2011 11:12:06 -0800
Todd Lipcon <to...@cloudera.com> wrote:

> On Mon, Jan 24, 2011 at 9:50 AM, Stack <st...@duboce.net> wrote:
> 
> > In HFileOutputFormat it says this near top:
> >
> >    // Invented config.  Add to hbase-*.xml if other than default
> > compression.
> >    final String compression = conf.get("hfile.compression",
> >      Compression.Algorithm.NONE.getName());
> >
> > You might try messing with this config?
> >
> 
> And would be great to file (and provide a patch for) a JIRA that
> automatically sets this based on the HTableDescriptor when you're
> loading into an existing table!
> 
> -Todd
> 
> 
> > On Sun, Jan 23, 2011 at 9:38 PM, Ashish Shinde <as...@strandls.com>
> > wrote:
> > > Hi,
> > >
> > > I have been importing data to hbase 0.90.0 using the code from
> > > the bulk uploader (ImportTsv.java). The table has LZO compression
> > > set, however unless major compaction is run the table it does not
> > > get compressed.
> > >
> > > Is there a way to compress the table as the bulk uploader creates
> > > the HFile. This is important for us because we don't want to have
> > > a burst increase in our disk usage.
> > >
> > > Thanks and regards,
> > >  - Ashish
> > >
> >
> 
> 
> 


Re: Bulk upload an LZO compression

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Jan 24, 2011 at 9:50 AM, Stack <st...@duboce.net> wrote:

> In HFileOutputFormat it says this near top:
>
>    // Invented config.  Add to hbase-*.xml if other than default
> compression.
>    final String compression = conf.get("hfile.compression",
>      Compression.Algorithm.NONE.getName());
>
> You might try messing with this config?
>

And would be great to file (and provide a patch for) a JIRA that
automatically sets this based on the HTableDescriptor when you're loading
into an existing table!

-Todd


> On Sun, Jan 23, 2011 at 9:38 PM, Ashish Shinde <as...@strandls.com>
> wrote:
> > Hi,
> >
> > I have been importing data to hbase 0.90.0 using the code from the bulk
> > uploader (ImportTsv.java). The table has LZO compression set, however
> > unless major compaction is run the table it does not get compressed.
> >
> > Is there a way to compress the table as the bulk uploader creates the
> > HFile. This is important for us because we don't want to have a burst
> > increase in our disk usage.
> >
> > Thanks and regards,
> >  - Ashish
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Bulk upload an LZO compression

Posted by Stack <st...@duboce.net>.
In HFileOutputFormat it says this near top:

    // Invented config.  Add to hbase-*.xml if other than default compression.
    final String compression = conf.get("hfile.compression",
      Compression.Algorithm.NONE.getName());

You might try messing with this config?

St.Ack

On Sun, Jan 23, 2011 at 9:38 PM, Ashish Shinde <as...@strandls.com> wrote:
> Hi,
>
> I have been importing data to hbase 0.90.0 using the code from the bulk
> uploader (ImportTsv.java). The table has LZO compression set, however
> unless major compaction is run the table it does not get compressed.
>
> Is there a way to compress the table as the bulk uploader creates the
> HFile. This is important for us because we don't want to have a burst
> increase in our disk usage.
>
> Thanks and regards,
>  - Ashish
>