You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Helin Xiang <xk...@gmail.com> on 2014/12/10 17:27:39 UTC

how to achieve availability with no data loss when replica = 1?

Hi,

in some topics of our system, the data volumn is so huge that we think
doing extra replica is a waste of disk and network resource( plus the data
is not so important).

firstly, we use 1 replica + ack=0, found when 1 broker is down, the data
would loss 1/n.
then we tried 1 replica + ack=1, and found after 3 tries, the data is still
lost. and when we set the try number large enough, no more data can be
produced to Kafka.

If I did not misunderstand, In 0.7, when 1 broker is down, both producing
and consuming are available with no data loss. I can see the reason why
kafka 0.8 is designed to be different like 0.7.  but is there a way to let
producer of 0.8 act like the behavior in 0.7?  We don't care which part of
data should go to the specific partition, as long as the data goes into the
kafka with no loss.


Thanks




-- 


*Best RegardsXiang Helin*

Re: how to achieve availability with no data loss when replica = 1?

Posted by Helin Xiang <xk...@gmail.com>.
Thanks, Jun.

Finally we found we just need to remove our partition key, and it won't
send data to the already dismissed broker. although it will cause the data
not so even among the partitions.


On Sat, Dec 13, 2014 at 1:50 AM, Jun Rao <ju...@confluent.io> wrote:
>
> To get the same behavior of 0.7, you just need to create multiple
> partitions for a topic in 0.8 with a replication factor of 1. When one of
> the partitions is not available, the producer will route the data to other
> partitions.
>
> Thanks,
>
> Jun
>
> On Wed, Dec 10, 2014 at 5:58 PM, Helin Xiang <xk...@gmail.com> wrote:
>
> > Yes, i mean the replication-factor == 1, because the data volumn is huge
> > for some special topic and we are more care about the disk and network
> > resource.
> >
> > So in your opinion, we can't have durability and availability as long as
> > the replication-factor == 1. Unless we implement another type of producer
> > by ourselves which would behave like 0.7 producer.
> >
> > Thanks
> >
> >
> > On Thu, Dec 11, 2014 at 9:47 AM, Joe Stein <jo...@stealth.ly> wrote:
> >
> > > By replica == 1 do you mean replication-factor == 1 or something
> > different?
> > >
> > > You should have replication-factor == 3 if you are trying to have
> durable
> > > writes survive failure. On the producer side set ack = -1 with that for
> > it
> > > to work as expected.
> > >
> > > On Wed, Dec 10, 2014 at 7:14 PM, Helin Xiang <xk...@gmail.com>
> wrote:
> > >
> > > > Thanks for the reply , Joe.
> > > >
> > > > In my opinion, when replica == 1, the ack == -1 would cause producer
> > > > stopping sending any data to kafka cluster if 1 broker is down. That
> > > means
> > > > we could not bear single point of failure. Am I right?
> > > >
> > > > What we want is when 1 broker down, and the topic replica is set to
> 1,
> > > the
> > > > whole system is still available and the data would go to other
> > partitions
> > > > without loss.
> > > >
> > > >
> > > > THANKS again.
> > > >
> > > > On Thu, Dec 11, 2014 at 12:37 AM, Joe Stein <jo...@stealth.ly>
> > > wrote:
> > > >
> > > > > If you want no data loss then you need to set ack = -1
> > > > > Copied from
> > > https://kafka.apache.org/documentation.html#producerconfigs
> > > > ==
> > > > > -1, which means that the producer gets an acknowledgement after all
> > > > in-sync
> > > > > replicas have received the data. This option provides the best
> > > > durability,
> > > > > we guarantee that no messages will be lost as long as at least one
> in
> > > > sync
> > > > > replica remains.
> > > > >
> > > > > /*******************************************
> > > > >  Joe Stein
> > > > >  Founder, Principal Consultant
> > > > >  Big Data Open Source Security LLC
> > > > >  http://www.stealth.ly
> > > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop
> >
> > > > > ********************************************/
> > > > >
> > > > > On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang <xk...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > in some topics of our system, the data volumn is so huge that we
> > > think
> > > > > > doing extra replica is a waste of disk and network resource( plus
> > the
> > > > > data
> > > > > > is not so important).
> > > > > >
> > > > > > firstly, we use 1 replica + ack=0, found when 1 broker is down,
> the
> > > > data
> > > > > > would loss 1/n.
> > > > > > then we tried 1 replica + ack=1, and found after 3 tries, the
> data
> > is
> > > > > still
> > > > > > lost. and when we set the try number large enough, no more data
> can
> > > be
> > > > > > produced to Kafka.
> > > > > >
> > > > > > If I did not misunderstand, In 0.7, when 1 broker is down, both
> > > > producing
> > > > > > and consuming are available with no data loss. I can see the
> reason
> > > why
> > > > > > kafka 0.8 is designed to be different like 0.7.  but is there a
> way
> > > to
> > > > > let
> > > > > > producer of 0.8 act like the behavior in 0.7?  We don't care
> which
> > > part
> > > > > of
> > > > > > data should go to the specific partition, as long as the data
> goes
> > > into
> > > > > the
> > > > > > kafka with no loss.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > >
> > > > > > *Best RegardsXiang Helin*
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > *Best Regards向河林*
> > > >
> > >
> > > --
> > >
> > >
> > > *Best Regards向河林*
> > >
> > >
> >
>


-- 


*Best Regards向河林*

Re: how to achieve availability with no data loss when replica = 1?

Posted by Jun Rao <ju...@confluent.io>.
To get the same behavior of 0.7, you just need to create multiple
partitions for a topic in 0.8 with a replication factor of 1. When one of
the partitions is not available, the producer will route the data to other
partitions.

Thanks,

Jun

On Wed, Dec 10, 2014 at 5:58 PM, Helin Xiang <xk...@gmail.com> wrote:

> Yes, i mean the replication-factor == 1, because the data volumn is huge
> for some special topic and we are more care about the disk and network
> resource.
>
> So in your opinion, we can't have durability and availability as long as
> the replication-factor == 1. Unless we implement another type of producer
> by ourselves which would behave like 0.7 producer.
>
> Thanks
>
>
> On Thu, Dec 11, 2014 at 9:47 AM, Joe Stein <jo...@stealth.ly> wrote:
>
> > By replica == 1 do you mean replication-factor == 1 or something
> different?
> >
> > You should have replication-factor == 3 if you are trying to have durable
> > writes survive failure. On the producer side set ack = -1 with that for
> it
> > to work as expected.
> >
> > On Wed, Dec 10, 2014 at 7:14 PM, Helin Xiang <xk...@gmail.com> wrote:
> >
> > > Thanks for the reply , Joe.
> > >
> > > In my opinion, when replica == 1, the ack == -1 would cause producer
> > > stopping sending any data to kafka cluster if 1 broker is down. That
> > means
> > > we could not bear single point of failure. Am I right?
> > >
> > > What we want is when 1 broker down, and the topic replica is set to 1,
> > the
> > > whole system is still available and the data would go to other
> partitions
> > > without loss.
> > >
> > >
> > > THANKS again.
> > >
> > > On Thu, Dec 11, 2014 at 12:37 AM, Joe Stein <jo...@stealth.ly>
> > wrote:
> > >
> > > > If you want no data loss then you need to set ack = -1
> > > > Copied from
> > https://kafka.apache.org/documentation.html#producerconfigs
> > > ==
> > > > -1, which means that the producer gets an acknowledgement after all
> > > in-sync
> > > > replicas have received the data. This option provides the best
> > > durability,
> > > > we guarantee that no messages will be lost as long as at least one in
> > > sync
> > > > replica remains.
> > > >
> > > > /*******************************************
> > > >  Joe Stein
> > > >  Founder, Principal Consultant
> > > >  Big Data Open Source Security LLC
> > > >  http://www.stealth.ly
> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > ********************************************/
> > > >
> > > > On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang <xk...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > in some topics of our system, the data volumn is so huge that we
> > think
> > > > > doing extra replica is a waste of disk and network resource( plus
> the
> > > > data
> > > > > is not so important).
> > > > >
> > > > > firstly, we use 1 replica + ack=0, found when 1 broker is down, the
> > > data
> > > > > would loss 1/n.
> > > > > then we tried 1 replica + ack=1, and found after 3 tries, the data
> is
> > > > still
> > > > > lost. and when we set the try number large enough, no more data can
> > be
> > > > > produced to Kafka.
> > > > >
> > > > > If I did not misunderstand, In 0.7, when 1 broker is down, both
> > > producing
> > > > > and consuming are available with no data loss. I can see the reason
> > why
> > > > > kafka 0.8 is designed to be different like 0.7.  but is there a way
> > to
> > > > let
> > > > > producer of 0.8 act like the behavior in 0.7?  We don't care which
> > part
> > > > of
> > > > > data should go to the specific partition, as long as the data goes
> > into
> > > > the
> > > > > kafka with no loss.
> > > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > *Best RegardsXiang Helin*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > *Best Regards向河林*
> > >
> >
> > --
> >
> >
> > *Best Regards向河林*
> >
> >
>

Re: how to achieve availability with no data loss when replica = 1?

Posted by Helin Xiang <xk...@gmail.com>.
Yes, i mean the replication-factor == 1, because the data volumn is huge
for some special topic and we are more care about the disk and network
resource.

So in your opinion, we can't have durability and availability as long as
the replication-factor == 1. Unless we implement another type of producer
by ourselves which would behave like 0.7 producer.

Thanks


On Thu, Dec 11, 2014 at 9:47 AM, Joe Stein <jo...@stealth.ly> wrote:

> By replica == 1 do you mean replication-factor == 1 or something different?
>
> You should have replication-factor == 3 if you are trying to have durable
> writes survive failure. On the producer side set ack = -1 with that for it
> to work as expected.
>
> On Wed, Dec 10, 2014 at 7:14 PM, Helin Xiang <xk...@gmail.com> wrote:
>
> > Thanks for the reply , Joe.
> >
> > In my opinion, when replica == 1, the ack == -1 would cause producer
> > stopping sending any data to kafka cluster if 1 broker is down. That
> means
> > we could not bear single point of failure. Am I right?
> >
> > What we want is when 1 broker down, and the topic replica is set to 1,
> the
> > whole system is still available and the data would go to other partitions
> > without loss.
> >
> >
> > THANKS again.
> >
> > On Thu, Dec 11, 2014 at 12:37 AM, Joe Stein <jo...@stealth.ly>
> wrote:
> >
> > > If you want no data loss then you need to set ack = -1
> > > Copied from
> https://kafka.apache.org/documentation.html#producerconfigs
> > ==
> > > -1, which means that the producer gets an acknowledgement after all
> > in-sync
> > > replicas have received the data. This option provides the best
> > durability,
> > > we guarantee that no messages will be lost as long as at least one in
> > sync
> > > replica remains.
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > > On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang <xk...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > in some topics of our system, the data volumn is so huge that we
> think
> > > > doing extra replica is a waste of disk and network resource( plus the
> > > data
> > > > is not so important).
> > > >
> > > > firstly, we use 1 replica + ack=0, found when 1 broker is down, the
> > data
> > > > would loss 1/n.
> > > > then we tried 1 replica + ack=1, and found after 3 tries, the data is
> > > still
> > > > lost. and when we set the try number large enough, no more data can
> be
> > > > produced to Kafka.
> > > >
> > > > If I did not misunderstand, In 0.7, when 1 broker is down, both
> > producing
> > > > and consuming are available with no data loss. I can see the reason
> why
> > > > kafka 0.8 is designed to be different like 0.7.  but is there a way
> to
> > > let
> > > > producer of 0.8 act like the behavior in 0.7?  We don't care which
> part
> > > of
> > > > data should go to the specific partition, as long as the data goes
> into
> > > the
> > > > kafka with no loss.
> > > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > *Best RegardsXiang Helin*
> > > >
> > >
> >
> >
> >
> > --
> >
> >
> > *Best Regards向河林*
> >
>
> --
>
>
> *Best Regards向河林*
>
>

Re: how to achieve availability with no data loss when replica = 1?

Posted by Joe Stein <jo...@stealth.ly>.
By replica == 1 do you mean replication-factor == 1 or something different?

You should have replication-factor == 3 if you are trying to have durable
writes survive failure. On the producer side set ack = -1 with that for it
to work as expected.

On Wed, Dec 10, 2014 at 7:14 PM, Helin Xiang <xk...@gmail.com> wrote:

> Thanks for the reply , Joe.
>
> In my opinion, when replica == 1, the ack == -1 would cause producer
> stopping sending any data to kafka cluster if 1 broker is down. That means
> we could not bear single point of failure. Am I right?
>
> What we want is when 1 broker down, and the topic replica is set to 1, the
> whole system is still available and the data would go to other partitions
> without loss.
>
>
> THANKS again.
>
> On Thu, Dec 11, 2014 at 12:37 AM, Joe Stein <jo...@stealth.ly> wrote:
>
> > If you want no data loss then you need to set ack = -1
> > Copied from https://kafka.apache.org/documentation.html#producerconfigs
> ==
> > -1, which means that the producer gets an acknowledgement after all
> in-sync
> > replicas have received the data. This option provides the best
> durability,
> > we guarantee that no messages will be lost as long as at least one in
> sync
> > replica remains.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> > On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang <xk...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > in some topics of our system, the data volumn is so huge that we think
> > > doing extra replica is a waste of disk and network resource( plus the
> > data
> > > is not so important).
> > >
> > > firstly, we use 1 replica + ack=0, found when 1 broker is down, the
> data
> > > would loss 1/n.
> > > then we tried 1 replica + ack=1, and found after 3 tries, the data is
> > still
> > > lost. and when we set the try number large enough, no more data can be
> > > produced to Kafka.
> > >
> > > If I did not misunderstand, In 0.7, when 1 broker is down, both
> producing
> > > and consuming are available with no data loss. I can see the reason why
> > > kafka 0.8 is designed to be different like 0.7.  but is there a way to
> > let
> > > producer of 0.8 act like the behavior in 0.7?  We don't care which part
> > of
> > > data should go to the specific partition, as long as the data goes into
> > the
> > > kafka with no loss.
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > *Best RegardsXiang Helin*
> > >
> >
>
>
>
> --
>
>
> *Best Regards向河林*
>

Re: how to achieve availability with no data loss when replica = 1?

Posted by Helin Xiang <xk...@gmail.com>.
Thanks for the reply , Joe.

In my opinion, when replica == 1, the ack == -1 would cause producer
stopping sending any data to kafka cluster if 1 broker is down. That means
we could not bear single point of failure. Am I right?

What we want is when 1 broker down, and the topic replica is set to 1, the
whole system is still available and the data would go to other partitions
without loss.


THANKS again.

On Thu, Dec 11, 2014 at 12:37 AM, Joe Stein <jo...@stealth.ly> wrote:

> If you want no data loss then you need to set ack = -1
> Copied from https://kafka.apache.org/documentation.html#producerconfigs ==
> -1, which means that the producer gets an acknowledgement after all in-sync
> replicas have received the data. This option provides the best durability,
> we guarantee that no messages will be lost as long as at least one in sync
> replica remains.
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
> On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang <xk...@gmail.com> wrote:
>
> > Hi,
> >
> > in some topics of our system, the data volumn is so huge that we think
> > doing extra replica is a waste of disk and network resource( plus the
> data
> > is not so important).
> >
> > firstly, we use 1 replica + ack=0, found when 1 broker is down, the data
> > would loss 1/n.
> > then we tried 1 replica + ack=1, and found after 3 tries, the data is
> still
> > lost. and when we set the try number large enough, no more data can be
> > produced to Kafka.
> >
> > If I did not misunderstand, In 0.7, when 1 broker is down, both producing
> > and consuming are available with no data loss. I can see the reason why
> > kafka 0.8 is designed to be different like 0.7.  but is there a way to
> let
> > producer of 0.8 act like the behavior in 0.7?  We don't care which part
> of
> > data should go to the specific partition, as long as the data goes into
> the
> > kafka with no loss.
> >
> >
> > Thanks
> >
> >
> >
> >
> > --
> >
> >
> > *Best RegardsXiang Helin*
> >
>



-- 


*Best Regards向河林*

Re: how to achieve availability with no data loss when replica = 1?

Posted by Joe Stein <jo...@stealth.ly>.
If you want no data loss then you need to set ack = -1
Copied from https://kafka.apache.org/documentation.html#producerconfigs ==
-1, which means that the producer gets an acknowledgement after all in-sync
replicas have received the data. This option provides the best durability,
we guarantee that no messages will be lost as long as at least one in sync
replica remains.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang <xk...@gmail.com> wrote:

> Hi,
>
> in some topics of our system, the data volumn is so huge that we think
> doing extra replica is a waste of disk and network resource( plus the data
> is not so important).
>
> firstly, we use 1 replica + ack=0, found when 1 broker is down, the data
> would loss 1/n.
> then we tried 1 replica + ack=1, and found after 3 tries, the data is still
> lost. and when we set the try number large enough, no more data can be
> produced to Kafka.
>
> If I did not misunderstand, In 0.7, when 1 broker is down, both producing
> and consuming are available with no data loss. I can see the reason why
> kafka 0.8 is designed to be different like 0.7.  but is there a way to let
> producer of 0.8 act like the behavior in 0.7?  We don't care which part of
> data should go to the specific partition, as long as the data goes into the
> kafka with no loss.
>
>
> Thanks
>
>
>
>
> --
>
>
> *Best RegardsXiang Helin*
>