You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Gagandeep Singh <ga...@paxcel.net> on 2010/09/02 12:34:18 UTC

Re: Data loss due to region server failure

Hi Daniel

I have downloaded hadoop-0.20.2+320.tar.gz from this location
http://archive.cloudera.com/cdh/3/
And also changed the *dfs.support.append* flag to *true* in your *
hdfs-site.xml* as mentioned here
http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.

But data loss is still happening. Am I using the right version?
Is there any other settings that I need to make so that data gets flushed to
HDFS.

Thanks,
Gagan



On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> That, or use CDH3b2.
>
> J-D
>
> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
> <ga...@paxcel.net> wrote:
> > Thanks Daniel
> >
> > It means I have to checkout the code from branch and build it on my local
> > machine.
> >
> > Gagan
> >
> >
> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> Then I would expect some form of dataloss yes, because stock hadoop
> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
> >> data made it to the datanodes when appending to the WAL. Please use
> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
> >>
> >> J-D
> >>
> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
> >> <ga...@paxcel.net> wrote:
> >> > HBase - 0.20.5
> >> > Hadoop - 0.20.2
> >> >
> >> > Thanks,
> >> > Gagan
> >> >
> >> >
> >> >
> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> Hadoop and HBase version?
> >> >>
> >> >> J-D
> >> >>
> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
> gagandeep.singh@paxcel.net>
> >> >> wrote:
> >> >>
> >> >> Hi Group,
> >> >>
> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records from my
> >> HBase
> >> >> client application. I am clubbing my Put operation such that 10
> records
> >> get
> >> >> added into the List<Put> and then I call the table.put(). I have not
> >> >> modified the default setting of Put operation which means all data is
> >> >> written in WAL and in case of server failure my data should not be
> lost.
> >> >>
> >> >> But I noticed somewhat strange behavior, while adding records if I
> kill
> >> my
> >> >> Region Server then my application waits till the time region data is
> >> moved
> >> >> to another region. But I noticed while doing so all my data is lost
> and
> >> my
> >> >> table is emptied.
> >> >>
> >> >> Could you help me understand the behavior. Is there some kind of
> Cache
> >> also
> >> >> involved while writing because of which my data is lost.
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Gagan
> >> >>
> >> >
> >>
> >
>

Re: Data loss due to region server failure

Posted by Stack <st...@duboce.net>.

You should be able to kill -9 the regionserver and only lose data that
follows the last time we sync'd (Default is sync each write IIRC).  If
is this not the case for you, then something is broken.  Lets figure
it out.

St.Ack

On Fri, Sep 3, 2010 at 1:55 AM, Gagandeep Singh
<ga...@paxcel.net> wrote:
> I think I have figured out the problem or may be not.
>
> In order to simulate RegionServer failure my fellow programmer was killing
> the Regionserver by *kill -9 pid . *But when I used *kill pid* everything
> seems to be working fine. Obviously now region server is going down
> gracefully so there is no data loss.
>
> I also checked it on hadoop 0.20.2(without append), HBase 0.20.5 version and
> found no data-loss in case of simple kill command.
> Now my next question is should it also work with *kill -9* command?
>
> FYI - I am using VMs. In my current setup I am using 3 VMs, 1 for Namenode
> and HBase Master and both 2 and 3 have Data node and region servers running
> on them.
>
> Thanks,
> Gagan
>
>
>
> On Thu, Sep 2, 2010 at 8:18 PM, Stack <st...@duboce.net> wrote:
>
>> On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
>> <ga...@paxcel.net> wrote:
>> > Hi Daniel
>> >
>> > I have downloaded hadoop-0.20.2+320.tar.gz from this location
>> > http://archive.cloudera.com/cdh/3/
>>
>>
>> That looks right, yes.
>>
>> > And also changed the *dfs.support.append* flag to *true* in your *
>> > hdfs-site.xml* as mentioned here
>> > http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
>> >
>>
>> That sounds right too.  As Ted suggests, you put it in to all configs
>> (though I believe it enabled by default on that branch -- in the UI
>> you'd see a warning if it was NOT enabled).
>>
>> > But data loss is still happening. Am I using the right version?
>> > Is there any other settings that I need to make so that data gets flushed
>> to
>> > HDFS.
>> >
>>
>> It looks like you are doing the right thing.  Can we see master log please?
>>
>> Thanks,
>> St.Ack
>>
>>
>> > Thanks,
>> > Gagan
>> >
>> >
>> >
>> > On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> That, or use CDH3b2.
>> >>
>> >> J-D
>> >>
>> >> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
>> >> <ga...@paxcel.net> wrote:
>> >> > Thanks Daniel
>> >> >
>> >> > It means I have to checkout the code from branch and build it on my
>> local
>> >> > machine.
>> >> >
>> >> > Gagan
>> >> >
>> >> >
>> >> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org
>> >> >wrote:
>> >> >
>> >> >> Then I would expect some form of dataloss yes, because stock hadoop
>> >> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
>> >> >> data made it to the datanodes when appending to the WAL. Please use
>> >> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
>> >> >> <ga...@paxcel.net> wrote:
>> >> >> > HBase - 0.20.5
>> >> >> > Hadoop - 0.20.2
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Gagan
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org
>> >> >> >wrote:
>> >> >> >
>> >> >> >> Hadoop and HBase version?
>> >> >> >>
>> >> >> >> J-D
>> >> >> >>
>> >> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
>> >> gagandeep.singh@paxcel.net>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> Hi Group,
>> >> >> >>
>> >> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records from
>> my
>> >> >> HBase
>> >> >> >> client application. I am clubbing my Put operation such that 10
>> >> records
>> >> >> get
>> >> >> >> added into the List<Put> and then I call the table.put(). I have
>> not
>> >> >> >> modified the default setting of Put operation which means all data
>> is
>> >> >> >> written in WAL and in case of server failure my data should not be
>> >> lost.
>> >> >> >>
>> >> >> >> But I noticed somewhat strange behavior, while adding records if I
>> >> kill
>> >> >> my
>> >> >> >> Region Server then my application waits till the time region data
>> is
>> >> >> moved
>> >> >> >> to another region. But I noticed while doing so all my data is
>> lost
>> >> and
>> >> >> my
>> >> >> >> table is emptied.
>> >> >> >>
>> >> >> >> Could you help me understand the behavior. Is there some kind of
>> >> Cache
>> >> >> also
>> >> >> >> involved while writing because of which my data is lost.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Gagan
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Data loss due to region server failure

Posted by Gagandeep Singh <ga...@paxcel.net>.

I think I have figured out the problem or may be not.

In order to simulate RegionServer failure my fellow programmer was killing
the Regionserver by *kill -9 pid . *But when I used *kill pid* everything
seems to be working fine. Obviously now region server is going down
gracefully so there is no data loss.

I also checked it on hadoop 0.20.2(without append), HBase 0.20.5 version and
found no data-loss in case of simple kill command.
Now my next question is should it also work with *kill -9* command?

FYI - I am using VMs. In my current setup I am using 3 VMs, 1 for Namenode
and HBase Master and both 2 and 3 have Data node and region servers running
on them.

Thanks,
Gagan



On Thu, Sep 2, 2010 at 8:18 PM, Stack <st...@duboce.net> wrote:

> On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
> <ga...@paxcel.net> wrote:
> > Hi Daniel
> >
> > I have downloaded hadoop-0.20.2+320.tar.gz from this location
> > http://archive.cloudera.com/cdh/3/
>
>
> That looks right, yes.
>
> > And also changed the *dfs.support.append* flag to *true* in your *
> > hdfs-site.xml* as mentioned here
> > http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
> >
>
> That sounds right too.  As Ted suggests, you put it in to all configs
> (though I believe it enabled by default on that branch -- in the UI
> you'd see a warning if it was NOT enabled).
>
> > But data loss is still happening. Am I using the right version?
> > Is there any other settings that I need to make so that data gets flushed
> to
> > HDFS.
> >
>
> It looks like you are doing the right thing.  Can we see master log please?
>
> Thanks,
> St.Ack
>
>
> > Thanks,
> > Gagan
> >
> >
> >
> > On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> That, or use CDH3b2.
> >>
> >> J-D
> >>
> >> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
> >> <ga...@paxcel.net> wrote:
> >> > Thanks Daniel
> >> >
> >> > It means I have to checkout the code from branch and build it on my
> local
> >> > machine.
> >> >
> >> > Gagan
> >> >
> >> >
> >> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> Then I would expect some form of dataloss yes, because stock hadoop
> >> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
> >> >> data made it to the datanodes when appending to the WAL. Please use
> >> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
> >> >> <ga...@paxcel.net> wrote:
> >> >> > HBase - 0.20.5
> >> >> > Hadoop - 0.20.2
> >> >> >
> >> >> > Thanks,
> >> >> > Gagan
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org
> >> >> >wrote:
> >> >> >
> >> >> >> Hadoop and HBase version?
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
> >> gagandeep.singh@paxcel.net>
> >> >> >> wrote:
> >> >> >>
> >> >> >> Hi Group,
> >> >> >>
> >> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records from
> my
> >> >> HBase
> >> >> >> client application. I am clubbing my Put operation such that 10
> >> records
> >> >> get
> >> >> >> added into the List<Put> and then I call the table.put(). I have
> not
> >> >> >> modified the default setting of Put operation which means all data
> is
> >> >> >> written in WAL and in case of server failure my data should not be
> >> lost.
> >> >> >>
> >> >> >> But I noticed somewhat strange behavior, while adding records if I
> >> kill
> >> >> my
> >> >> >> Region Server then my application waits till the time region data
> is
> >> >> moved
> >> >> >> to another region. But I noticed while doing so all my data is
> lost
> >> and
> >> >> my
> >> >> >> table is emptied.
> >> >> >>
> >> >> >> Could you help me understand the behavior. Is there some kind of
> >> Cache
> >> >> also
> >> >> >> involved while writing because of which my data is lost.
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Gagan
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Data loss due to region server failure

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Sep 2, 2010 at 7:52 AM, Ted Yu <yu...@gmail.com> wrote:

> By default, this config doesn't appear in hdfs-site.xml or hbase-site.xml
> (cdh3b2)
>
> I searched for this config in 0.20.6 source code and didn't find reference:
>
> tyumac:hbase-0.20.6 tyu$ find . -name *.java -exec grep
> 'dfs.support.append'
> {} \; -print
>
> tyumac:hbase-0.20.6 tyu$ cd ~/hadoop-0.20.2+320/
> tyumac:hadoop-0.20.2+320 tyu$ find . -name *.java -exec grep
> 'dfs.support.append' {} \; -print
>   * configured with the parameter dfs.support.append set to true, otherwise
> ./src/hdfs/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
>      boolean supportAppends = conf.getBoolean("dfs.support.append", false);
> ./src/hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
>    this.supportAppends = conf.getBoolean("dfs.support.append", false);
>                            " Please refer to dfs.support.append
> configuration parameter.");
> ./src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
>
> I think Cloudera people can provide some hint here.
>

Yes, we changed the default inside hdfs-default.xml (inside the jar)

-Todd


>
> On Thu, Sep 2, 2010 at 7:48 AM, Stack <st...@duboce.net> wrote:
>
> > On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
> > <ga...@paxcel.net> wrote:
> > > Hi Daniel
> > >
> > > I have downloaded hadoop-0.20.2+320.tar.gz from this location
> > > http://archive.cloudera.com/cdh/3/
> >
> >
> > That looks right, yes.
> >
> > > And also changed the *dfs.support.append* flag to *true* in your *
> > > hdfs-site.xml* as mentioned here
> > > http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
> > >
> >
> > That sounds right too.  As Ted suggests, you put it in to all configs
> > (though I believe it enabled by default on that branch -- in the UI
> > you'd see a warning if it was NOT enabled).
> >
> > > But data loss is still happening. Am I using the right version?
> > > Is there any other settings that I need to make so that data gets
> flushed
> > to
> > > HDFS.
> > >
> >
> > It looks like you are doing the right thing.  Can we see master log
> please?
> >
> > Thanks,
> > St.Ack
> >
> >
> > > Thanks,
> > > Gagan
> > >
> > >
> > >
> > > On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <
> > jdcryans@apache.org>wrote:
> > >
> > >> That, or use CDH3b2.
> > >>
> > >> J-D
> > >>
> > >> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
> > >> <ga...@paxcel.net> wrote:
> > >> > Thanks Daniel
> > >> >
> > >> > It means I have to checkout the code from branch and build it on my
> > local
> > >> > machine.
> > >> >
> > >> > Gagan
> > >> >
> > >> >
> > >> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
> > jdcryans@apache.org
> > >> >wrote:
> > >> >
> > >> >> Then I would expect some form of dataloss yes, because stock hadoop
> > >> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether
> the
> > >> >> data made it to the datanodes when appending to the WAL. Please use
> > >> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
> > >> >>
> > >> >> J-D
> > >> >>
> > >> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
> > >> >> <ga...@paxcel.net> wrote:
> > >> >> > HBase - 0.20.5
> > >> >> > Hadoop - 0.20.2
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Gagan
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
> > >> jdcryans@apache.org
> > >> >> >wrote:
> > >> >> >
> > >> >> >> Hadoop and HBase version?
> > >> >> >>
> > >> >> >> J-D
> > >> >> >>
> > >> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
> > >> gagandeep.singh@paxcel.net>
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >> Hi Group,
> > >> >> >>
> > >> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records
> from
> > my
> > >> >> HBase
> > >> >> >> client application. I am clubbing my Put operation such that 10
> > >> records
> > >> >> get
> > >> >> >> added into the List<Put> and then I call the table.put(). I have
> > not
> > >> >> >> modified the default setting of Put operation which means all
> data
> > is
> > >> >> >> written in WAL and in case of server failure my data should not
> be
> > >> lost.
> > >> >> >>
> > >> >> >> But I noticed somewhat strange behavior, while adding records if
> I
> > >> kill
> > >> >> my
> > >> >> >> Region Server then my application waits till the time region
> data
> > is
> > >> >> moved
> > >> >> >> to another region. But I noticed while doing so all my data is
> > lost
> > >> and
> > >> >> my
> > >> >> >> table is emptied.
> > >> >> >>
> > >> >> >> Could you help me understand the behavior. Is there some kind of
> > >> Cache
> > >> >> also
> > >> >> >> involved while writing because of which my data is lost.
> > >> >> >>
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >> Gagan
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Data loss due to region server failure

Posted by Ted Yu <yu...@gmail.com>.

By default, this config doesn't appear in hdfs-site.xml or hbase-site.xml
(cdh3b2)

I searched for this config in 0.20.6 source code and didn't find reference:

tyumac:hbase-0.20.6 tyu$ find . -name *.java -exec grep 'dfs.support.append'
{} \; -print

tyumac:hbase-0.20.6 tyu$ cd ~/hadoop-0.20.2+320/
tyumac:hadoop-0.20.2+320 tyu$ find . -name *.java -exec grep
'dfs.support.append' {} \; -print
   * configured with the parameter dfs.support.append set to true, otherwise
./src/hdfs/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
      boolean supportAppends = conf.getBoolean("dfs.support.append", false);
./src/hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
    this.supportAppends = conf.getBoolean("dfs.support.append", false);
                            " Please refer to dfs.support.append
configuration parameter.");
./src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java

I think Cloudera people can provide some hint here.

On Thu, Sep 2, 2010 at 7:48 AM, Stack <st...@duboce.net> wrote:

> On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
> <ga...@paxcel.net> wrote:
> > Hi Daniel
> >
> > I have downloaded hadoop-0.20.2+320.tar.gz from this location
> > http://archive.cloudera.com/cdh/3/
>
>
> That looks right, yes.
>
> > And also changed the *dfs.support.append* flag to *true* in your *
> > hdfs-site.xml* as mentioned here
> > http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
> >
>
> That sounds right too.  As Ted suggests, you put it in to all configs
> (though I believe it enabled by default on that branch -- in the UI
> you'd see a warning if it was NOT enabled).
>
> > But data loss is still happening. Am I using the right version?
> > Is there any other settings that I need to make so that data gets flushed
> to
> > HDFS.
> >
>
> It looks like you are doing the right thing.  Can we see master log please?
>
> Thanks,
> St.Ack
>
>
> > Thanks,
> > Gagan
> >
> >
> >
> > On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> That, or use CDH3b2.
> >>
> >> J-D
> >>
> >> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
> >> <ga...@paxcel.net> wrote:
> >> > Thanks Daniel
> >> >
> >> > It means I have to checkout the code from branch and build it on my
> local
> >> > machine.
> >> >
> >> > Gagan
> >> >
> >> >
> >> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> Then I would expect some form of dataloss yes, because stock hadoop
> >> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
> >> >> data made it to the datanodes when appending to the WAL. Please use
> >> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
> >> >> <ga...@paxcel.net> wrote:
> >> >> > HBase - 0.20.5
> >> >> > Hadoop - 0.20.2
> >> >> >
> >> >> > Thanks,
> >> >> > Gagan
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org
> >> >> >wrote:
> >> >> >
> >> >> >> Hadoop and HBase version?
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
> >> gagandeep.singh@paxcel.net>
> >> >> >> wrote:
> >> >> >>
> >> >> >> Hi Group,
> >> >> >>
> >> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records from
> my
> >> >> HBase
> >> >> >> client application. I am clubbing my Put operation such that 10
> >> records
> >> >> get
> >> >> >> added into the List<Put> and then I call the table.put(). I have
> not
> >> >> >> modified the default setting of Put operation which means all data
> is
> >> >> >> written in WAL and in case of server failure my data should not be
> >> lost.
> >> >> >>
> >> >> >> But I noticed somewhat strange behavior, while adding records if I
> >> kill
> >> >> my
> >> >> >> Region Server then my application waits till the time region data
> is
> >> >> moved
> >> >> >> to another region. But I noticed while doing so all my data is
> lost
> >> and
> >> >> my
> >> >> >> table is emptied.
> >> >> >>
> >> >> >> Could you help me understand the behavior. Is there some kind of
> >> Cache
> >> >> also
> >> >> >> involved while writing because of which my data is lost.
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Gagan
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Data loss due to region server failure

Posted by Stack <st...@duboce.net>.

On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
<ga...@paxcel.net> wrote:
> Hi Daniel
>
> I have downloaded hadoop-0.20.2+320.tar.gz from this location
> http://archive.cloudera.com/cdh/3/


That looks right, yes.

> And also changed the *dfs.support.append* flag to *true* in your *
> hdfs-site.xml* as mentioned here
> http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
>

That sounds right too.  As Ted suggests, you put it in to all configs
(though I believe it enabled by default on that branch -- in the UI
you'd see a warning if it was NOT enabled).

> But data loss is still happening. Am I using the right version?
> Is there any other settings that I need to make so that data gets flushed to
> HDFS.
>

It looks like you are doing the right thing.  Can we see master log please?

Thanks,
St.Ack


> Thanks,
> Gagan
>
>
>
> On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> That, or use CDH3b2.
>>
>> J-D
>>
>> On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
>> <ga...@paxcel.net> wrote:
>> > Thanks Daniel
>> >
>> > It means I have to checkout the code from branch and build it on my local
>> > machine.
>> >
>> > Gagan
>> >
>> >
>> > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >
>> >> Then I would expect some form of dataloss yes, because stock hadoop
>> >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
>> >> data made it to the datanodes when appending to the WAL. Please use
>> >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
>> >>
>> >> J-D
>> >>
>> >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
>> >> <ga...@paxcel.net> wrote:
>> >> > HBase - 0.20.5
>> >> > Hadoop - 0.20.2
>> >> >
>> >> > Thanks,
>> >> > Gagan
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org
>> >> >wrote:
>> >> >
>> >> >> Hadoop and HBase version?
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
>> gagandeep.singh@paxcel.net>
>> >> >> wrote:
>> >> >>
>> >> >> Hi Group,
>> >> >>
>> >> >> I am checking HBase/HDFS fail over. I am inserting 1M records from my
>> >> HBase
>> >> >> client application. I am clubbing my Put operation such that 10
>> records
>> >> get
>> >> >> added into the List<Put> and then I call the table.put(). I have not
>> >> >> modified the default setting of Put operation which means all data is
>> >> >> written in WAL and in case of server failure my data should not be
>> lost.
>> >> >>
>> >> >> But I noticed somewhat strange behavior, while adding records if I
>> kill
>> >> my
>> >> >> Region Server then my application waits till the time region data is
>> >> moved
>> >> >> to another region. But I noticed while doing so all my data is lost
>> and
>> >> my
>> >> >> table is emptied.
>> >> >>
>> >> >> Could you help me understand the behavior. Is there some kind of
>> Cache
>> >> also
>> >> >> involved while writing because of which my data is lost.
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Gagan
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Data loss due to region server failure

Posted by Ted Yu <yu...@gmail.com>.

Gagandeep:
Take a look at https://issues.apache.org/jira/browse/HBASE-2789

I think you need to propagate hbase-site.xml to region servers.

On Thu, Sep 2, 2010 at 3:34 AM, Gagandeep Singh
<ga...@paxcel.net>wrote:

> Hi Daniel
>
> I have downloaded hadoop-0.20.2+320.tar.gz from this location
> http://archive.cloudera.com/cdh/3/
> And also changed the *dfs.support.append* flag to *true* in your *
> hdfs-site.xml* as mentioned here
> http://wiki.apache.org/hadoop/Hbase/HdfsSyncSupport.
>
> But data loss is still happening. Am I using the right version?
> Is there any other settings that I need to make so that data gets flushed
> to
> HDFS.
>
> Thanks,
> Gagan
>
>
>
> On Thu, Aug 26, 2010 at 11:57 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
>
> > That, or use CDH3b2.
> >
> > J-D
> >
> > On Thu, Aug 26, 2010 at 11:22 AM, Gagandeep Singh
> > <ga...@paxcel.net> wrote:
> > > Thanks Daniel
> > >
> > > It means I have to checkout the code from branch and build it on my
> local
> > > machine.
> > >
> > > Gagan
> > >
> > >
> > > On Thu, Aug 26, 2010 at 9:51 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> > >wrote:
> > >
> > >> Then I would expect some form of dataloss yes, because stock hadoop
> > >> 0.20 doesn't have any form of fsync so HBase doesn't know whether the
> > >> data made it to the datanodes when appending to the WAL. Please use
> > >> the 0.20-append hadoop branch with HBase 0.89 or cloudera's CDH3b2.
> > >>
> > >> J-D
> > >>
> > >> On Thu, Aug 26, 2010 at 7:22 AM, Gagandeep Singh
> > >> <ga...@paxcel.net> wrote:
> > >> > HBase - 0.20.5
> > >> > Hadoop - 0.20.2
> > >> >
> > >> > Thanks,
> > >> > Gagan
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Aug 26, 2010 at 7:11 PM, Jean-Daniel Cryans <
> > jdcryans@apache.org
> > >> >wrote:
> > >> >
> > >> >> Hadoop and HBase version?
> > >> >>
> > >> >> J-D
> > >> >>
> > >> >> On Aug 26, 2010 5:36 AM, "Gagandeep Singh" <
> > gagandeep.singh@paxcel.net>
> > >> >> wrote:
> > >> >>
> > >> >> Hi Group,
> > >> >>
> > >> >> I am checking HBase/HDFS fail over. I am inserting 1M records from
> my
> > >> HBase
> > >> >> client application. I am clubbing my Put operation such that 10
> > records
> > >> get
> > >> >> added into the List<Put> and then I call the table.put(). I have
> not
> > >> >> modified the default setting of Put operation which means all data
> is
> > >> >> written in WAL and in case of server failure my data should not be
> > lost.
> > >> >>
> > >> >> But I noticed somewhat strange behavior, while adding records if I
> > kill
> > >> my
> > >> >> Region Server then my application waits till the time region data
> is
> > >> moved
> > >> >> to another region. But I noticed while doing so all my data is lost
> > and
> > >> my
> > >> >> table is emptied.
> > >> >>
> > >> >> Could you help me understand the behavior. Is there some kind of
> > Cache
> > >> also
> > >> >> involved while writing because of which my data is lost.
> > >> >>
> > >> >>
> > >> >> Thanks,
> > >> >> Gagan
> > >> >>
> > >> >
> > >>
> > >
> >
>