You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ravindranath Akila <ra...@gmail.com> on 2014/10/18 12:35:37 UTC

Archive Files

Is there any approach HBASE can store archive like rarely used files on
cheap storage?

That's a vague question. If I may elaborate...

Our current office stores terabytes of well structured log data on s3 to
save cost. The other day I was asked to process all these files. These
files are still used for Analytics and other decision making. The logs come
from a RTB (Real Time Bidding) system.

Now ideally these files would have been on HDFS, but would incur large
storage costs over time since they are only occasionally used but the
servers need to be up and running to store them.

By context of Big Data, aren't these files big date files? If so is there a
cheap way of storing them on HBASE? For example, write  a storage adapter
of sorts.

I'm really sorry if this isn't the right place to ask this. Thanks in
advance :)


-- 
R. A.
BTW, there is a website called* Thank God it's Friday!*
It tells you fun things to do in your area over the weekend.
*See here: http://www.ThankGodItIsFriday.com
<http://www.ThankGodItIsFriday.com>*

Re: Archive Files

Posted by Wilm Schumacher <wi...@cawoom.com>.

Am 18.10.2014 um 12:35 schrieb Ravindranath Akila:
> Is there any approach HBASE can store archive like rarely used files on
> cheap storage?
hadoop directly is equiped for that. There are HAR files, map files and
sequence files.

If I understand correctly, sequence files is what you are searching for.

In the other hand, you could read the data into a hbase, and do your
stuff there, if you put your logging data directly into the hbase
cluster by default.

Best wishes,

Wilm

Re: Archive Files

Posted by Ravindranath Akila <ra...@gmail.com>.
Thanks so much guys! This is exactly what I was looking for! :-)

On Saturday, October 18, 2014, Ted Yu <yu...@gmail.com> wrote:

> Take a look at https://issues.apache.org/jira/browse/HDFS-6584
>
> It is in the upcoming hadoop 2.6 release.
>
> Cheers
>
> On Oct 18, 2014, at 3:35 AM, Ravindranath Akila <
> ravindranathakila@gmail.com <javascript:;>> wrote:
>
> > Is there any approach HBASE can store archive like rarely used files on
> > cheap storage?
> >
> > That's a vague question. If I may elaborate...
> >
> > Our current office stores terabytes of well structured log data on s3 to
> > save cost. The other day I was asked to process all these files. These
> > files are still used for Analytics and other decision making. The logs
> come
> > from a RTB (Real Time Bidding) system.
> >
> > Now ideally these files would have been on HDFS, but would incur large
> > storage costs over time since they are only occasionally used but the
> > servers need to be up and running to store them.
> >
> > By context of Big Data, aren't these files big date files? If so is
> there a
> > cheap way of storing them on HBASE? For example, write  a storage adapter
> > of sorts.
> >
> > I'm really sorry if this isn't the right place to ask this. Thanks in
> > advance :)
> >
> >
> > --
> > R. A.
> > BTW, there is a website called* Thank God it's Friday!*
> > It tells you fun things to do in your area over the weekend.
> > *See here: http://www.ThankGodItIsFriday.com
> > <http://www.ThankGodItIsFriday.com>*
>


-- 
R. A.
BTW, there is a website called* Thank God it's Friday!*
It tells you fun things to do in your area over the weekend.
*See here: http://www.ThankGodItIsFriday.com
<http://www.ThankGodItIsFriday.com>*

Re: Archive Files

Posted by Ted Yu <yu...@gmail.com>.
Take a look at https://issues.apache.org/jira/browse/HDFS-6584

It is in the upcoming hadoop 2.6 release. 

Cheers

On Oct 18, 2014, at 3:35 AM, Ravindranath Akila <ra...@gmail.com> wrote:

> Is there any approach HBASE can store archive like rarely used files on
> cheap storage?
> 
> That's a vague question. If I may elaborate...
> 
> Our current office stores terabytes of well structured log data on s3 to
> save cost. The other day I was asked to process all these files. These
> files are still used for Analytics and other decision making. The logs come
> from a RTB (Real Time Bidding) system.
> 
> Now ideally these files would have been on HDFS, but would incur large
> storage costs over time since they are only occasionally used but the
> servers need to be up and running to store them.
> 
> By context of Big Data, aren't these files big date files? If so is there a
> cheap way of storing them on HBASE? For example, write  a storage adapter
> of sorts.
> 
> I'm really sorry if this isn't the right place to ask this. Thanks in
> advance :)
> 
> 
> -- 
> R. A.
> BTW, there is a website called* Thank God it's Friday!*
> It tells you fun things to do in your area over the weekend.
> *See here: http://www.ThankGodItIsFriday.com
> <http://www.ThankGodItIsFriday.com>*