You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Anty <an...@gmail.com> on 2013/02/20 04:10:35 UTC

Problem In Understanding Compaction Process

Hi: Guys

      I have some problem in understanding the compaction process, Can
someone shed some light on me, much appreciate. Here is the problem:

      Region Server after successfully generate the final compacted file,
it going through two steps:
       1. move the above compacted file into region's directory
       2. delete replaced files.

       the above two steps are not atomic, if Region Server crash after
step1, and  before step2, then there are duplication records!  Is this
problem handled  in reading process , or there is another mechanism to fix
this?

-- 
Best Regards
Anty Rao

Re: Problem In Understanding Compaction Process

Posted by Sergey Shelukhin <se...@hortonworks.com>.
As for compaction file set update atomicity, I don't think it would
currently be possible.
It would require adding a separate feature; the first thing that comes to
mind is storing the file set in some sort of a meta-file, and updating it
atomically (as far as HDFS file replacement is atomic); then using that to
load files.

More importantly, files, as is, can contain multiple versions of the
record, and can also contain delete records that invalidate previous
updates.
What is you scenario for analyzing them directly?

On Fri, Feb 22, 2013 at 11:16 PM, Anty <an...@gmail.com> wrote:

> Thanks Sergey
> In my use case. I want to directly analyze the underlying HFiles, So i
> can't tolerance duplicate data.
>
> Can you give me some pointers about how to make this procedure atomic?
>
>
>
>
>
> On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin <sergey@hortonworks.com
> >wrote:
>
> > There should be no duplicate records despite the file not being deleted -
> > between the records with exact same key/version/etc., the newer file
> would
> > be chosen by logical sequence. If that happens to be the same some choice
> > (by time, or name), still one file will be chosen.
> > Eventually, the file will be compacted again and disappear. Granted, by
> > making the move atomic (via some meta/manifest file) we could avoid some
> > overhead in this case at the cost of some added complexity, but it should
> > be rather rare.
> >
> > On Tue, Feb 19, 2013 at 7:10 PM, Anty <an...@gmail.com> wrote:
> >
> > > Hi: Guys
> > >
> > >       I have some problem in understanding the compaction process, Can
> > > someone shed some light on me, much appreciate. Here is the problem:
> > >
> > >       Region Server after successfully generate the final compacted
> file,
> > > it going through two steps:
> > >        1. move the above compacted file into region's directory
> > >        2. delete replaced files.
> > >
> > >        the above two steps are not atomic, if Region Server crash after
> > > step1, and  before step2, then there are duplication records!  Is this
> > > problem handled  in reading process , or there is another mechanism to
> > fix
> > > this?
> > >
> > > --
> > > Best Regards
> > > Anty Rao
> > >
> >
>
>
>
> --
> Best Regards
> Anty Rao
>

Re: Problem In Understanding Compaction Process

Posted by Anty <an...@gmail.com>.
Thanks Sergey
In my use case. I want to directly analyze the underlying HFiles, So i
can't tolerance duplicate data.

Can you give me some pointers about how to make this procedure atomic?





On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> There should be no duplicate records despite the file not being deleted -
> between the records with exact same key/version/etc., the newer file would
> be chosen by logical sequence. If that happens to be the same some choice
> (by time, or name), still one file will be chosen.
> Eventually, the file will be compacted again and disappear. Granted, by
> making the move atomic (via some meta/manifest file) we could avoid some
> overhead in this case at the cost of some added complexity, but it should
> be rather rare.
>
> On Tue, Feb 19, 2013 at 7:10 PM, Anty <an...@gmail.com> wrote:
>
> > Hi: Guys
> >
> >       I have some problem in understanding the compaction process, Can
> > someone shed some light on me, much appreciate. Here is the problem:
> >
> >       Region Server after successfully generate the final compacted file,
> > it going through two steps:
> >        1. move the above compacted file into region's directory
> >        2. delete replaced files.
> >
> >        the above two steps are not atomic, if Region Server crash after
> > step1, and  before step2, then there are duplication records!  Is this
> > problem handled  in reading process , or there is another mechanism to
> fix
> > this?
> >
> > --
> > Best Regards
> > Anty Rao
> >
>



-- 
Best Regards
Anty Rao

Re: Problem In Understanding Compaction Process

Posted by Sergey Shelukhin <se...@hortonworks.com>.
There should be no duplicate records despite the file not being deleted -
between the records with exact same key/version/etc., the newer file would
be chosen by logical sequence. If that happens to be the same some choice
(by time, or name), still one file will be chosen.
Eventually, the file will be compacted again and disappear. Granted, by
making the move atomic (via some meta/manifest file) we could avoid some
overhead in this case at the cost of some added complexity, but it should
be rather rare.

On Tue, Feb 19, 2013 at 7:10 PM, Anty <an...@gmail.com> wrote:

> Hi: Guys
>
>       I have some problem in understanding the compaction process, Can
> someone shed some light on me, much appreciate. Here is the problem:
>
>       Region Server after successfully generate the final compacted file,
> it going through two steps:
>        1. move the above compacted file into region's directory
>        2. delete replaced files.
>
>        the above two steps are not atomic, if Region Server crash after
> step1, and  before step2, then there are duplication records!  Is this
> problem handled  in reading process , or there is another mechanism to fix
> this?
>
> --
> Best Regards
> Anty Rao
>