You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kudu.apache.org by Todd Lipcon <to...@cloudera.com> on 2018/01/04 23:35:32 UTC

Re: Data inconsistency after restart

Hey Petter,

Did you ever get to the bottom of this? We definitely don't expect Kudu to
lose data on a restart (and we have hundreds of tests running continuously
which try to ensure this)

-Todd

On Fri, Dec 8, 2017 at 10:13 PM, David Alves <da...@gmail.com> wrote:

> Hi Petter
>
>  Don't have answers yet, but I do have some more questions  (inline)
>
> Petter von Dolwitz (Hem) writes:
>
> Hi David,
>>
>> In short to summarize:
>>
>> 1. I ingest data. Kudus maintenance threads stops working (soft memory
>> limit) and incoming data is throttled. There are no errors reported on the
>> client side.
>>
>  What is the "client side"? impala? spark? java/c++.
>
> 2. I stop ingestion and wait until i *think* Kudu is finsished.
>>
>  The question above is pertinent. Impala will not return until a  query
>  is fully successful, though it may return an error and leave a  query
>  only half-way executed. If you're using the client apis directly
>  though are you checking for error when inserting?
>
> 3. I restart Kudu.
>> 4. I validate the inserted data by doing count(*) on groups of data in
>> Kudu. For several groups, Kudu reports a lot of rows missing.
>>
>  Kudu's default scan mode is READ_LATEST. While this is the most
>  performance oriented mode, its also the one with the least  guarantees
>  so, on startup its possible that it reads from a stale replica,  giving
>  the _appearance_ that rows went missing. Things to try here:
>  - Try the same query a few minutes later. Is the answer  different?
>  - If the above is true consider changing your scan mode to
>  READ_AT_SNAPHOT. In this mode data is guaranteed not to be  state,
>  though you might have to wait for all replicas to be ready
>
> 5. I ingest the same data again. Client reports that all row are already
>> present.
>>
>  This isn't surprising _if_ the problem is indeed from state  replicas.
>
>> 6. Doing the count(*) exercise again now gives me the correct number of
>> rows.
>>
>> This tells me that the data was ingested into Kudu on the first attempt
>> but
>> a scan did not find the data. Inserting the data again made it
>> visible.
>>
>  Can it be that after the scan it's just that enough time has  elapsed
>  so that all replicas are caught up? I'd say this is likely the  case.
>
>>
>> Br,
>> Petter
>>
>> 2017-12-07 21:39 GMT+01:00 David Alves <da...@gmail.com>:
>>
>> Hi Petter
>>>
>>>    I'd like to clarify what exactly happened and exactly what    are you
>>> referring to as "inconsistency".
>>>    From what I understand of the first error you observed, the    Kudu
>>> was
>>> underprovisioned, memory wise, and the ingest jobs/queries failed. Is
>>> that
>>> right? Since Kudu doesn't have atomic multi-row writes, it's currently
>>> expected in this case that you'll end up with partially written data.
>>>    If you tried the same job again, and it succeeded, for    certain
>>> types of
>>> operation (UPSERT, INSERT IGNORE) then the remaining rows would be
>>> written
>>> and all the data would be there as expected.
>>>    I'd like to distinguish this lack of atomicity on multi-row
>>> transactions from "inconsistency", which is what you might observe if an
>>> operation didn't fail, but you couldn't see all the data. For this latter
>>> case there are options you can choose to avoid any inconsistency.
>>>
>>> Best
>>> David
>>>
>>>
>>>
>>> On Wed, Dec 6, 2017 at 4:26 AM, Petter von Dolwitz (Hem) <
>>> petter.von.dolwitz@gmail.com> wrote:
>>>
>>> Thanks for your reply Andrew!
>>>>
>>>> >How did you verify that all the data was inserted and how did >you find
>>>> some data missing?
>>>> This was done using Impala. We counted the rows for groups representing
>>>> the chunks we inserted.
>>>>
>>>> >Following up on what I posted, take a look at
>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>> read_operations_scans. It seems definitely possible that not all of the
>>>> rows had finished inserting when counting, or that the scans were sent
>>>> to a
>>>> stale replica.
>>>> Before we shut down we could only see the following in the logs. I.e.,
>>>> no
>>>> sign that ingestion was still ongoing.
>>>>
>>>> kudu-tserver.ip-xx-yyy-z-nnn.root.log.INFO.20171201-065232.90314:I1201
>>>> 07:27:35.010694 90793 maintenance_manager.cc:383] P
>>>> a38902afefca4a85a5469d149df9b4cb: we have exceeded our soft memory
>>>> limit
>>>> (current capacity is 67.52%).  However, there are no ops currently
>>>> runnable
>>>> which would free memory.
>>>>
>>>> Also the (cloudera) metric total_kudu_rows_inserted_rate_
>>>> across_kudu_replicas
>>>> showed zero.
>>>>
>>>> Still it seems like some data became inconsistent after restart. But if
>>>> the maintenance_manager performs important jobs that are required to
>>>> ensure
>>>> that all data is inserted then I can understand why we ended up with
>>>> inconsistent data. But, if I understand you correct,  you are saying
>>>> that
>>>> these jobs are not critical for ingestion. In the link you provided I
>>>> read
>>>> "Impala scans are currently performed as READ_LATEST and have no
>>>> consistency guarantees.". I would assume this means that it does not
>>>> guarantee consistency if new data is inserted but should give valid (and
>>>> same) results if no new data is inserted?
>>>>
>>>> I have not tried the ksck tool yet. Thank you for reminding. I will have
>>>> a look.
>>>>
>>>> Br,
>>>> Petter
>>>>
>>>>
>>>> 2017-12-06 1:31 GMT+01:00 Andrew Wong <aw...@cloudera.com>:
>>>>
>>>> How did you verify that all the data was inserted and how did you find
>>>>>
>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>> inserting
>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>
>>>>>>
>>>>> Following up on what I posted, take a look at
>>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>>> read_operations_scans. It seems definitely possible that not all of the
>>>>> rows had finished inserting when counting, or that the scans were sent
>>>>> to a
>>>>> stale replica.
>>>>>
>>>>> On Tue, Dec 5, 2017 at 4:18 PM, Andrew Wong <aw...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>> Hi Petter,
>>>>>>
>>>>>> When we verified that all data was inserted we found that some data
>>>>>> was
>>>>>>
>>>>>>> missing. We added this missing data and on some chunks we got the
>>>>>>> information that all rows were already present, i.e impala says
>>>>>>> something
>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again
>>>>>>> now
>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>> insert
>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>> returns
>>>>>>> a different value.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> How did you verify that all the data was inserted and how did you find
>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>> inserting
>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>
>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu after
>>>>>>
>>>>>>> seeing soft memory limit warnings?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Your data should be consistently written, even with those warnings.
>>>>>> AFAIK they would cause a bit of slowness, not incorrect results.
>>>>>>
>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these
>>>>>>
>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>> only
>>>>>>> restart the tablet servers, only restart one tablet server at a time
>>>>>>> or
>>>>>>> something like that)?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> In general, you can use the `ksck` tool to check the health of your
>>>>>> cluster. See https://kudu.apache.org/docs/command_line_tools_referenc
>>>>>> e.html#cluster-ksck for more details. For restarting a cluster, I
>>>>>> would recommend taking down all tablet servers at once, otherwise
>>>>>> tablet
>>>>>> replicas may try to replicate data from the server that was taken
>>>>>> down.
>>>>>>
>>>>>> Hope this helped,
>>>>>> Andrew
>>>>>>
>>>>>> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
>>>>>> petter.von.dolwitz@gmail.com> wrote:
>>>>>>
>>>>>> Hi Kudu users,
>>>>>>>
>>>>>>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline for
>>>>>>> evaluation we ingested 3 month worth of data. During ingestion we
>>>>>>> were
>>>>>>> facing messages from the maintenance threads that a soft memory
>>>>>>> limit were
>>>>>>> reached. It seems like the background maintenance threads stopped
>>>>>>> performing their tasks at this point in time. It also so seems like
>>>>>>> the
>>>>>>> memory was never recovered even after stopping ingestion so I guess
>>>>>>> there
>>>>>>> was a large backlog being built up. I guess the root cause here is
>>>>>>> that we
>>>>>>> were a bit too conservative when giving Kudu memory. After a
>>>>>>> reststart a
>>>>>>> lot of maintenance tasks were started (i.e. compaction).
>>>>>>>
>>>>>>> When we verified that all data was inserted we found that some data
>>>>>>> was missing. We added this missing data and on some chunks we got the
>>>>>>> information that all rows were already present, i.e impala says
>>>>>>> something
>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again
>>>>>>> now
>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>> insert
>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>> returns
>>>>>>> a different value.
>>>>>>>
>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>> after
>>>>>>> seeing soft memory limit warnings?
>>>>>>>
>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these
>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>> only
>>>>>>> restart the tablet servers, only restart one tablet server at a time
>>>>>>> or
>>>>>>> something like that)?
>>>>>>>
>>>>>>> The table design uses 50 tablets per day (times 90 days). It is 8 TB
>>>>>>> of data after 3xreplication over 5 tablet servers.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Petter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrew Wong
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Andrew Wong
>>>>>
>>>>>
>>>>
>>>>
>>>
>
> --
> David Alves
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Data inconsistency after restart

Posted by "Petter von Dolwitz (Hem)" <pe...@gmail.com>.

Hi Todd,

I have not set aside time to re-create the issue again. I still aim to do
this. When I do, I will collect all relevant logs and create a JIRA.

Thank you for following up.

Br,
Petter

2018-01-05 0:35 GMT+01:00 Todd Lipcon <to...@cloudera.com>:

> Hey Petter,
>
> Did you ever get to the bottom of this? We definitely don't expect Kudu to
> lose data on a restart (and we have hundreds of tests running continuously
> which try to ensure this)
>
> -Todd
>
> On Fri, Dec 8, 2017 at 10:13 PM, David Alves <da...@gmail.com>
> wrote:
>
>> Hi Petter
>>
>>  Don't have answers yet, but I do have some more questions  (inline)
>>
>> Petter von Dolwitz (Hem) writes:
>>
>> Hi David,
>>>
>>> In short to summarize:
>>>
>>> 1. I ingest data. Kudus maintenance threads stops working (soft memory
>>> limit) and incoming data is throttled. There are no errors reported on
>>> the
>>> client side.
>>>
>>  What is the "client side"? impala? spark? java/c++.
>>
>> 2. I stop ingestion and wait until i *think* Kudu is finsished.
>>>
>>  The question above is pertinent. Impala will not return until a  query
>>  is fully successful, though it may return an error and leave a  query
>>  only half-way executed. If you're using the client apis directly
>>  though are you checking for error when inserting?
>>
>> 3. I restart Kudu.
>>> 4. I validate the inserted data by doing count(*) on groups of data in
>>> Kudu. For several groups, Kudu reports a lot of rows missing.
>>>
>>  Kudu's default scan mode is READ_LATEST. While this is the most
>>  performance oriented mode, its also the one with the least  guarantees
>>  so, on startup its possible that it reads from a stale replica,  giving
>>  the _appearance_ that rows went missing. Things to try here:
>>  - Try the same query a few minutes later. Is the answer  different?
>>  - If the above is true consider changing your scan mode to
>>  READ_AT_SNAPHOT. In this mode data is guaranteed not to be  state,
>>  though you might have to wait for all replicas to be ready
>>
>> 5. I ingest the same data again. Client reports that all row are already
>>> present.
>>>
>>  This isn't surprising _if_ the problem is indeed from state  replicas.
>>
>>> 6. Doing the count(*) exercise again now gives me the correct number of
>>> rows.
>>>
>>> This tells me that the data was ingested into Kudu on the first attempt
>>> but
>>> a scan did not find the data. Inserting the data again made it
>>> visible.
>>>
>>  Can it be that after the scan it's just that enough time has  elapsed
>>  so that all replicas are caught up? I'd say this is likely the  case.
>>
>>>
>>> Br,
>>> Petter
>>>
>>> 2017-12-07 21:39 GMT+01:00 David Alves <da...@gmail.com>:
>>>
>>> Hi Petter
>>>>
>>>>    I'd like to clarify what exactly happened and exactly what    are you
>>>> referring to as "inconsistency".
>>>>    From what I understand of the first error you observed, the    Kudu
>>>> was
>>>> underprovisioned, memory wise, and the ingest jobs/queries failed. Is
>>>> that
>>>> right? Since Kudu doesn't have atomic multi-row writes, it's currently
>>>> expected in this case that you'll end up with partially written data.
>>>>    If you tried the same job again, and it succeeded, for    certain
>>>> types of
>>>> operation (UPSERT, INSERT IGNORE) then the remaining rows would be
>>>> written
>>>> and all the data would be there as expected.
>>>>    I'd like to distinguish this lack of atomicity on multi-row
>>>> transactions from "inconsistency", which is what you might observe if an
>>>> operation didn't fail, but you couldn't see all the data. For this
>>>> latter
>>>> case there are options you can choose to avoid any inconsistency.
>>>>
>>>> Best
>>>> David
>>>>
>>>>
>>>>
>>>> On Wed, Dec 6, 2017 at 4:26 AM, Petter von Dolwitz (Hem) <
>>>> petter.von.dolwitz@gmail.com> wrote:
>>>>
>>>> Thanks for your reply Andrew!
>>>>>
>>>>> >How did you verify that all the data was inserted and how did >you
>>>>> find
>>>>> some data missing?
>>>>> This was done using Impala. We counted the rows for groups representing
>>>>> the chunks we inserted.
>>>>>
>>>>> >Following up on what I posted, take a look at
>>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>>> read_operations_scans. It seems definitely possible that not all of the
>>>>> rows had finished inserting when counting, or that the scans were sent
>>>>> to a
>>>>> stale replica.
>>>>> Before we shut down we could only see the following in the logs. I.e.,
>>>>> no
>>>>> sign that ingestion was still ongoing.
>>>>>
>>>>> kudu-tserver.ip-xx-yyy-z-nnn.root.log.INFO.20171201-065232.90314:I1201
>>>>> 07:27:35.010694 90793 maintenance_manager.cc:383] P
>>>>> a38902afefca4a85a5469d149df9b4cb: we have exceeded our soft memory
>>>>> limit
>>>>> (current capacity is 67.52%).  However, there are no ops currently
>>>>> runnable
>>>>> which would free memory.
>>>>>
>>>>> Also the (cloudera) metric total_kudu_rows_inserted_rate_
>>>>> across_kudu_replicas
>>>>> showed zero.
>>>>>
>>>>> Still it seems like some data became inconsistent after restart. But if
>>>>> the maintenance_manager performs important jobs that are required to
>>>>> ensure
>>>>> that all data is inserted then I can understand why we ended up with
>>>>> inconsistent data. But, if I understand you correct,  you are saying
>>>>> that
>>>>> these jobs are not critical for ingestion. In the link you provided I
>>>>> read
>>>>> "Impala scans are currently performed as READ_LATEST and have no
>>>>> consistency guarantees.". I would assume this means that it does not
>>>>> guarantee consistency if new data is inserted but should give valid
>>>>> (and
>>>>> same) results if no new data is inserted?
>>>>>
>>>>> I have not tried the ksck tool yet. Thank you for reminding. I will
>>>>> have
>>>>> a look.
>>>>>
>>>>> Br,
>>>>> Petter
>>>>>
>>>>>
>>>>> 2017-12-06 1:31 GMT+01:00 Andrew Wong <aw...@cloudera.com>:
>>>>>
>>>>> How did you verify that all the data was inserted and how did you find
>>>>>>
>>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>>> inserting
>>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>>
>>>>>>>
>>>>>> Following up on what I posted, take a look at
>>>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>>>> read_operations_scans. It seems definitely possible that not all of
>>>>>> the
>>>>>> rows had finished inserting when counting, or that the scans were
>>>>>> sent to a
>>>>>> stale replica.
>>>>>>
>>>>>> On Tue, Dec 5, 2017 at 4:18 PM, Andrew Wong <aw...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Petter,
>>>>>>>
>>>>>>> When we verified that all data was inserted we found that some data
>>>>>>> was
>>>>>>>
>>>>>>>> missing. We added this missing data and on some chunks we got the
>>>>>>>> information that all rows were already present, i.e impala says
>>>>>>>> something
>>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again
>>>>>>>> now
>>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>>> insert
>>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>>> returns
>>>>>>>> a different value.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> How did you verify that all the data was inserted and how did you
>>>>>>> find
>>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>>> inserting
>>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>>
>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>> after
>>>>>>>
>>>>>>>> seeing soft memory limit warnings?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Your data should be consistently written, even with those warnings.
>>>>>>> AFAIK they would cause a bit of slowness, not incorrect results.
>>>>>>>
>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these
>>>>>>>
>>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>>> only
>>>>>>>> restart the tablet servers, only restart one tablet server at a
>>>>>>>> time or
>>>>>>>> something like that)?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In general, you can use the `ksck` tool to check the health of your
>>>>>>> cluster. See https://kudu.apache.org/docs/c
>>>>>>> ommand_line_tools_referenc
>>>>>>> e.html#cluster-ksck for more details. For restarting a cluster, I
>>>>>>> would recommend taking down all tablet servers at once, otherwise
>>>>>>> tablet
>>>>>>> replicas may try to replicate data from the server that was taken
>>>>>>> down.
>>>>>>>
>>>>>>> Hope this helped,
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
>>>>>>> petter.von.dolwitz@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Kudu users,
>>>>>>>>
>>>>>>>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline
>>>>>>>> for
>>>>>>>> evaluation we ingested 3 month worth of data. During ingestion we
>>>>>>>> were
>>>>>>>> facing messages from the maintenance threads that a soft memory
>>>>>>>> limit were
>>>>>>>> reached. It seems like the background maintenance threads stopped
>>>>>>>> performing their tasks at this point in time. It also so seems like
>>>>>>>> the
>>>>>>>> memory was never recovered even after stopping ingestion so I guess
>>>>>>>> there
>>>>>>>> was a large backlog being built up. I guess the root cause here is
>>>>>>>> that we
>>>>>>>> were a bit too conservative when giving Kudu memory. After a
>>>>>>>> reststart a
>>>>>>>> lot of maintenance tasks were started (i.e. compaction).
>>>>>>>>
>>>>>>>> When we verified that all data was inserted we found that some data
>>>>>>>> was missing. We added this missing data and on some chunks we got
>>>>>>>> the
>>>>>>>> information that all rows were already present, i.e impala says
>>>>>>>> something
>>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again
>>>>>>>> now
>>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>>> insert
>>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>>> returns
>>>>>>>> a different value.
>>>>>>>>
>>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>>> after
>>>>>>>> seeing soft memory limit warnings?
>>>>>>>>
>>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid
>>>>>>>> these
>>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>>> only
>>>>>>>> restart the tablet servers, only restart one tablet server at a
>>>>>>>> time or
>>>>>>>> something like that)?
>>>>>>>>
>>>>>>>> The table design uses 50 tablets per day (times 90 days). It is 8 TB
>>>>>>>> of data after 3xreplication over 5 tablet servers.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Petter
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Andrew Wong
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrew Wong
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>> --
>> David Alves
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>