You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Denis Magda <dm...@gridgain.com> on 2019/01/23 22:55:28 UTC

Re: Ignite index corruption issue -> unrecoverable cluster

Another data/index corruption issue:
https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance

It's suggested to clean index.bin to be able to recover the cluster. Folks,
let's prepare a list of actions to do if a cluster becomes unrecoverable
due to data or index corruption issue. What should we do depending on an
exception:

   - Remove index.bin if X or Y or Z
   - etc


--
Denis Magda


On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com> wrote:

> Ignite SQL and memory experts,
>
> The following issue was reported on SO:
>
> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
>
> The stack trace starts with the message below, more details are in that
> forum:
>
> [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
> Unexpected exception during cache update
> org.h2.message.DbException: General error: "class
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> Runtime failure on row: Row@75ab6623[ key: CacheKey [idHash=242632156,
> hash=-841684964, parentId=-8607237606486310912, hour=9,
> id=-8607237528489033728, date=2018-09-09 00:00:00.0], val: CacheValue
> [idHash=843227122, hash=-801894604, ....
>
> Let's see if it's addressed in the latest release. Also, the user asked a
> reasonable question - how to recover? Yes, it's possible to use snapshots
> of GridGain if they are created before but I remember some discussions
> around a recovery tool.
>
> --
> Denis
>

Re: Ignite index corruption issue -> unrecoverable cluster

Posted by Denis Magda <dm...@gridgain.com>.
Stan, great, thanks for sharing the knowledge!

Prachi, could you please document this on readme and share the docs in the
thread?
https://issues.apache.org/jira/browse/IGNITE-11252

--
Denis Magda


On Thu, Feb 7, 2019 at 6:33 AM Stanislav Lukyanov <st...@gmail.com>
wrote:

> Denis,
>
> When an index is corrupted you just need to remove index.bin file of the
> affected cache.
> After that, when the node starts it will rebuild the indexes.
> The performance of the SQL queries will be low until the index is rebuilt,
> so you need to be cautious.
>
> The main problem is to understand that the indexes are corrupted.
> Usually one needs to analyze the exception stack trace to find this out,
> and it requires some familiarity with Ignite code base.
>
> The TODO lists I can come up with are:
>
> # Recovering from an index corruption
> ## Applicable if
> It is known that an index of a cache is corrupted, but the main data
> (partition files and WAL) is fine.
>
> ## Steps to recover
> 1. Stop the node
> 2. Delete index.bin of the affected caches (path is
> db/<consistent_id>/cache-<cache_name>/index.bin)
> 3. Start the node
> - Note: At this point the node is active in the cluster but don’t have
> indexes.
> It means that it serves SQL queries but their performance can be low.
> Avoid running SQL queries on large tables at this point
> 4. Wait for message “Finished indexes rebuilding for cache <cache_name>”
> in the Ignite log
>
> # Recovering from a persistent storage corruption
> ## Applicable if
> A part of the persistent storage (partition files, checkpoint markers or
> WAL) was corrupted
> and there is no other way to recover it, but there are healthy copies of
> all data on other nodes.
>
> ## Steps to recover
> 1. Stop the node
> 2. Delete all persistence files of the node (best to clear Ignite working
> directory, storage directory, WAL and WAL archive directories)
> 3. Make sure consistentId is explicitly set in the configuration of the
> node
> - If it isn’t, lookup the generated consistentId using control.sh and set
> it explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
> 4. Start the node
> 5. Wait for messages <Finished rebalancing cache> for all caches
>
>
> We could have more fine-grained ways to handle data corruption once we
> address issues from the
> “Stating with missing PDS pieces” thread, create a WAL and/or partition
> files recovery tool,
> allow to have records in WAL for a missing cache (say, we deleted
> corrupted files of a single cache), etc.
>
> Stan
>
> From: Denis Magda
> Sent: 7 февраля 2019 г. 3:12
> To: dev; Stanislav Lukyanov
> Subject: Re: Ignite index corruption issue -> unrecoverable cluster
>
> Stan,
>
> Thanks for staring "Starting with missing PDS pieces" that is promising to
> embed usability changes into the source code. In the meantime, could you
> propose a TODO list for recovering from index corruption and similar
> scenarios? I know that you're experienced in that and it will be great to
> document the procedures until the code is modified.
>
> -
> Denis
>
>
> On Wed, Jan 30, 2019 at 1:02 PM Denis Magda <dm...@apache.org> wrote:
>
> > Dmitry,
> >
> > Thanks, the FAQ section might make sense but, as the practice shows, it's
> > hard to get recommendations even for questions like this one :)
> >
> > Ignite experts, please chime in, the project fails with data corruption
> > periodically and we have to explain how to come around until an issue is
> > resolved.
> >
> > -
> > Denis
> >
> >
> > On Wed, Jan 30, 2019 at 11:55 AM Dmitriy Pavlov <dp...@apache.org>
> > wrote:
> >
> >> Denis,
> >>
> >> BTW one case of corruption is fixed here,
> >> https://issues.apache.org/jira/browse/IGNITE-11030
> >>
> >> I still need a review from Ignite Native Persistence Experts. I feel it
> is
> >> really important to apply such fixes.
> >>
> >> Sincerely,
> >> Dmitriy Pavlov
> >>
> >> чт, 24 янв. 2019 г. в 16:29, Dmitriy Pavlov <dp...@apache.org>:
> >>
> >> > Denis, Whan do you think about a more general idea of creating FAQs
> for
> >> > Ignite users?
> >> >
> >> > What if experts will once place their answer in a wiki page and then
> >> > develop answers for frequent problems.
> >> >
> >> > And before diving into researching each problem, experienced community
> >> > members will ask users to check the FAQ first?
> >> >
> >> > Sincerely,
> >> > Dmitriy Pavlov
> >> >
> >> > P.S. here is an article, Apache guides have reference to
> >> > http://www.catb.org/~esr/faqs/smart-questions.html - one from
> required
> >> > actions from users is to search for information.
> >> >
> >> > чт, 24 янв. 2019 г. в 01:55, Denis Magda <dm...@gridgain.com>:
> >> >
> >> >> Another data/index corruption issue:
> >> >>
> >> >>
> >>
> https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance
> >> >>
> >> >> It's suggested to clean index.bin to be able to recover the cluster.
> >> >> Folks,
> >> >> let's prepare a list of actions to do if a cluster becomes
> >> unrecoverable
> >> >> due to data or index corruption issue. What should we do depending on
> >> an
> >> >> exception:
> >> >>
> >> >>    - Remove index.bin if X or Y or Z
> >> >>    - etc
> >> >>
> >> >>
> >> >> --
> >> >> Denis Magda
> >> >>
> >> >>
> >> >> On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com>
> >> wrote:
> >> >>
> >> >> > Ignite SQL and memory experts,
> >> >> >
> >> >> > The following issue was reported on SO:
> >> >> >
> >> >> >
> >> >>
> >>
> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
> >> >> >
> >> >> > The stack trace starts with the message below, more details are in
> >> that
> >> >> > forum:
> >> >> >
> >> >> > [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
> >> >> > Unexpected exception during cache update
> >> >> > org.h2.message.DbException: General error: "class
> >> >> >
> >> >>
> >>
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> >> >> > Runtime failure on row: Row@75ab6623[ key: CacheKey
> >> [idHash=242632156,
> >> >> > hash=-841684964, parentId=-8607237606486310912, hour=9,
> >> >> > id=-8607237528489033728, date=2018-09-09 00:00:00.0], val:
> CacheValue
> >> >> > [idHash=843227122, hash=-801894604, ....
> >> >> >
> >> >> > Let's see if it's addressed in the latest release. Also, the user
> >> asked
> >> >> a
> >> >> > reasonable question - how to recover? Yes, it's possible to use
> >> >> snapshots
> >> >> > of GridGain if they are created before but I remember some
> >> discussions
> >> >> > around a recovery tool.
> >> >> >
> >> >> > --
> >> >> > Denis
> >> >> >
> >> >>
> >> >
> >>
> >
>
>

RE: Ignite index corruption issue -> unrecoverable cluster

Posted by Stanislav Lukyanov <st...@gmail.com>.
Denis,

When an index is corrupted you just need to remove index.bin file of the affected cache.
After that, when the node starts it will rebuild the indexes. 
The performance of the SQL queries will be low until the index is rebuilt, so you need to be cautious.

The main problem is to understand that the indexes are corrupted.
Usually one needs to analyze the exception stack trace to find this out,
and it requires some familiarity with Ignite code base.

The TODO lists I can come up with are:

# Recovering from an index corruption
## Applicable if
It is known that an index of a cache is corrupted, but the main data (partition files and WAL) is fine.

## Steps to recover
1. Stop the node
2. Delete index.bin of the affected caches (path is db/<consistent_id>/cache-<cache_name>/index.bin)
3. Start the node
- Note: At this point the node is active in the cluster but don’t have indexes. 
It means that it serves SQL queries but their performance can be low.
Avoid running SQL queries on large tables at this point
4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in the Ignite log

# Recovering from a persistent storage corruption
## Applicable if
A part of the persistent storage (partition files, checkpoint markers or WAL) was corrupted
and there is no other way to recover it, but there are healthy copies of all data on other nodes.

## Steps to recover
1. Stop the node
2. Delete all persistence files of the node (best to clear Ignite working directory, storage directory, WAL and WAL archive directories)
3. Make sure consistentId is explicitly set in the configuration of the node
- If it isn’t, lookup the generated consistentId using control.sh and set it explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
4. Start the node
5. Wait for messages <Finished rebalancing cache> for all caches


We could have more fine-grained ways to handle data corruption once we address issues from the
“Stating with missing PDS pieces” thread, create a WAL and/or partition files recovery tool,
allow to have records in WAL for a missing cache (say, we deleted corrupted files of a single cache), etc.

Stan

From: Denis Magda
Sent: 7 февраля 2019 г. 3:12
To: dev; Stanislav Lukyanov
Subject: Re: Ignite index corruption issue -> unrecoverable cluster

Stan,

Thanks for staring "Starting with missing PDS pieces" that is promising to
embed usability changes into the source code. In the meantime, could you
propose a TODO list for recovering from index corruption and similar
scenarios? I know that you're experienced in that and it will be great to
document the procedures until the code is modified.

-
Denis


On Wed, Jan 30, 2019 at 1:02 PM Denis Magda <dm...@apache.org> wrote:

> Dmitry,
>
> Thanks, the FAQ section might make sense but, as the practice shows, it's
> hard to get recommendations even for questions like this one :)
>
> Ignite experts, please chime in, the project fails with data corruption
> periodically and we have to explain how to come around until an issue is
> resolved.
>
> -
> Denis
>
>
> On Wed, Jan 30, 2019 at 11:55 AM Dmitriy Pavlov <dp...@apache.org>
> wrote:
>
>> Denis,
>>
>> BTW one case of corruption is fixed here,
>> https://issues.apache.org/jira/browse/IGNITE-11030
>>
>> I still need a review from Ignite Native Persistence Experts. I feel it is
>> really important to apply such fixes.
>>
>> Sincerely,
>> Dmitriy Pavlov
>>
>> чт, 24 янв. 2019 г. в 16:29, Dmitriy Pavlov <dp...@apache.org>:
>>
>> > Denis, Whan do you think about a more general idea of creating FAQs for
>> > Ignite users?
>> >
>> > What if experts will once place their answer in a wiki page and then
>> > develop answers for frequent problems.
>> >
>> > And before diving into researching each problem, experienced community
>> > members will ask users to check the FAQ first?
>> >
>> > Sincerely,
>> > Dmitriy Pavlov
>> >
>> > P.S. here is an article, Apache guides have reference to
>> > http://www.catb.org/~esr/faqs/smart-questions.html - one from required
>> > actions from users is to search for information.
>> >
>> > чт, 24 янв. 2019 г. в 01:55, Denis Magda <dm...@gridgain.com>:
>> >
>> >> Another data/index corruption issue:
>> >>
>> >>
>> https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance
>> >>
>> >> It's suggested to clean index.bin to be able to recover the cluster.
>> >> Folks,
>> >> let's prepare a list of actions to do if a cluster becomes
>> unrecoverable
>> >> due to data or index corruption issue. What should we do depending on
>> an
>> >> exception:
>> >>
>> >>    - Remove index.bin if X or Y or Z
>> >>    - etc
>> >>
>> >>
>> >> --
>> >> Denis Magda
>> >>
>> >>
>> >> On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com>
>> wrote:
>> >>
>> >> > Ignite SQL and memory experts,
>> >> >
>> >> > The following issue was reported on SO:
>> >> >
>> >> >
>> >>
>> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
>> >> >
>> >> > The stack trace starts with the message below, more details are in
>> that
>> >> > forum:
>> >> >
>> >> > [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
>> >> > Unexpected exception during cache update
>> >> > org.h2.message.DbException: General error: "class
>> >> >
>> >>
>> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>> >> > Runtime failure on row: Row@75ab6623[ key: CacheKey
>> [idHash=242632156,
>> >> > hash=-841684964, parentId=-8607237606486310912, hour=9,
>> >> > id=-8607237528489033728, date=2018-09-09 00:00:00.0], val: CacheValue
>> >> > [idHash=843227122, hash=-801894604, ....
>> >> >
>> >> > Let's see if it's addressed in the latest release. Also, the user
>> asked
>> >> a
>> >> > reasonable question - how to recover? Yes, it's possible to use
>> >> snapshots
>> >> > of GridGain if they are created before but I remember some
>> discussions
>> >> > around a recovery tool.
>> >> >
>> >> > --
>> >> > Denis
>> >> >
>> >>
>> >
>>
>


Re: Ignite index corruption issue -> unrecoverable cluster

Posted by Denis Magda <dm...@apache.org>.
Stan,

Thanks for staring "Starting with missing PDS pieces" that is promising to
embed usability changes into the source code. In the meantime, could you
propose a TODO list for recovering from index corruption and similar
scenarios? I know that you're experienced in that and it will be great to
document the procedures until the code is modified.

-
Denis


On Wed, Jan 30, 2019 at 1:02 PM Denis Magda <dm...@apache.org> wrote:

> Dmitry,
>
> Thanks, the FAQ section might make sense but, as the practice shows, it's
> hard to get recommendations even for questions like this one :)
>
> Ignite experts, please chime in, the project fails with data corruption
> periodically and we have to explain how to come around until an issue is
> resolved.
>
> -
> Denis
>
>
> On Wed, Jan 30, 2019 at 11:55 AM Dmitriy Pavlov <dp...@apache.org>
> wrote:
>
>> Denis,
>>
>> BTW one case of corruption is fixed here,
>> https://issues.apache.org/jira/browse/IGNITE-11030
>>
>> I still need a review from Ignite Native Persistence Experts. I feel it is
>> really important to apply such fixes.
>>
>> Sincerely,
>> Dmitriy Pavlov
>>
>> чт, 24 янв. 2019 г. в 16:29, Dmitriy Pavlov <dp...@apache.org>:
>>
>> > Denis, Whan do you think about a more general idea of creating FAQs for
>> > Ignite users?
>> >
>> > What if experts will once place their answer in a wiki page and then
>> > develop answers for frequent problems.
>> >
>> > And before diving into researching each problem, experienced community
>> > members will ask users to check the FAQ first?
>> >
>> > Sincerely,
>> > Dmitriy Pavlov
>> >
>> > P.S. here is an article, Apache guides have reference to
>> > http://www.catb.org/~esr/faqs/smart-questions.html - one from required
>> > actions from users is to search for information.
>> >
>> > чт, 24 янв. 2019 г. в 01:55, Denis Magda <dm...@gridgain.com>:
>> >
>> >> Another data/index corruption issue:
>> >>
>> >>
>> https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance
>> >>
>> >> It's suggested to clean index.bin to be able to recover the cluster.
>> >> Folks,
>> >> let's prepare a list of actions to do if a cluster becomes
>> unrecoverable
>> >> due to data or index corruption issue. What should we do depending on
>> an
>> >> exception:
>> >>
>> >>    - Remove index.bin if X or Y or Z
>> >>    - etc
>> >>
>> >>
>> >> --
>> >> Denis Magda
>> >>
>> >>
>> >> On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com>
>> wrote:
>> >>
>> >> > Ignite SQL and memory experts,
>> >> >
>> >> > The following issue was reported on SO:
>> >> >
>> >> >
>> >>
>> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
>> >> >
>> >> > The stack trace starts with the message below, more details are in
>> that
>> >> > forum:
>> >> >
>> >> > [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
>> >> > Unexpected exception during cache update
>> >> > org.h2.message.DbException: General error: "class
>> >> >
>> >>
>> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>> >> > Runtime failure on row: Row@75ab6623[ key: CacheKey
>> [idHash=242632156,
>> >> > hash=-841684964, parentId=-8607237606486310912, hour=9,
>> >> > id=-8607237528489033728, date=2018-09-09 00:00:00.0], val: CacheValue
>> >> > [idHash=843227122, hash=-801894604, ....
>> >> >
>> >> > Let's see if it's addressed in the latest release. Also, the user
>> asked
>> >> a
>> >> > reasonable question - how to recover? Yes, it's possible to use
>> >> snapshots
>> >> > of GridGain if they are created before but I remember some
>> discussions
>> >> > around a recovery tool.
>> >> >
>> >> > --
>> >> > Denis
>> >> >
>> >>
>> >
>>
>

Re: Ignite index corruption issue -> unrecoverable cluster

Posted by Denis Magda <dm...@apache.org>.
Dmitry,

Thanks, the FAQ section might make sense but, as the practice shows, it's
hard to get recommendations even for questions like this one :)

Ignite experts, please chime in, the project fails with data corruption
periodically and we have to explain how to come around until an issue is
resolved.

-
Denis


On Wed, Jan 30, 2019 at 11:55 AM Dmitriy Pavlov <dp...@apache.org> wrote:

> Denis,
>
> BTW one case of corruption is fixed here,
> https://issues.apache.org/jira/browse/IGNITE-11030
>
> I still need a review from Ignite Native Persistence Experts. I feel it is
> really important to apply such fixes.
>
> Sincerely,
> Dmitriy Pavlov
>
> чт, 24 янв. 2019 г. в 16:29, Dmitriy Pavlov <dp...@apache.org>:
>
> > Denis, Whan do you think about a more general idea of creating FAQs for
> > Ignite users?
> >
> > What if experts will once place their answer in a wiki page and then
> > develop answers for frequent problems.
> >
> > And before diving into researching each problem, experienced community
> > members will ask users to check the FAQ first?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > P.S. here is an article, Apache guides have reference to
> > http://www.catb.org/~esr/faqs/smart-questions.html - one from required
> > actions from users is to search for information.
> >
> > чт, 24 янв. 2019 г. в 01:55, Denis Magda <dm...@gridgain.com>:
> >
> >> Another data/index corruption issue:
> >>
> >>
> https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance
> >>
> >> It's suggested to clean index.bin to be able to recover the cluster.
> >> Folks,
> >> let's prepare a list of actions to do if a cluster becomes unrecoverable
> >> due to data or index corruption issue. What should we do depending on an
> >> exception:
> >>
> >>    - Remove index.bin if X or Y or Z
> >>    - etc
> >>
> >>
> >> --
> >> Denis Magda
> >>
> >>
> >> On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com>
> wrote:
> >>
> >> > Ignite SQL and memory experts,
> >> >
> >> > The following issue was reported on SO:
> >> >
> >> >
> >>
> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
> >> >
> >> > The stack trace starts with the message below, more details are in
> that
> >> > forum:
> >> >
> >> > [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
> >> > Unexpected exception during cache update
> >> > org.h2.message.DbException: General error: "class
> >> >
> >>
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> >> > Runtime failure on row: Row@75ab6623[ key: CacheKey
> [idHash=242632156,
> >> > hash=-841684964, parentId=-8607237606486310912, hour=9,
> >> > id=-8607237528489033728, date=2018-09-09 00:00:00.0], val: CacheValue
> >> > [idHash=843227122, hash=-801894604, ....
> >> >
> >> > Let's see if it's addressed in the latest release. Also, the user
> asked
> >> a
> >> > reasonable question - how to recover? Yes, it's possible to use
> >> snapshots
> >> > of GridGain if they are created before but I remember some discussions
> >> > around a recovery tool.
> >> >
> >> > --
> >> > Denis
> >> >
> >>
> >
>

Re: Ignite index corruption issue -> unrecoverable cluster

Posted by Dmitriy Pavlov <dp...@apache.org>.
Denis,

BTW one case of corruption is fixed here,
https://issues.apache.org/jira/browse/IGNITE-11030

I still need a review from Ignite Native Persistence Experts. I feel it is
really important to apply such fixes.

Sincerely,
Dmitriy Pavlov

чт, 24 янв. 2019 г. в 16:29, Dmitriy Pavlov <dp...@apache.org>:

> Denis, Whan do you think about a more general idea of creating FAQs for
> Ignite users?
>
> What if experts will once place their answer in a wiki page and then
> develop answers for frequent problems.
>
> And before diving into researching each problem, experienced community
> members will ask users to check the FAQ first?
>
> Sincerely,
> Dmitriy Pavlov
>
> P.S. here is an article, Apache guides have reference to
> http://www.catb.org/~esr/faqs/smart-questions.html - one from required
> actions from users is to search for information.
>
> чт, 24 янв. 2019 г. в 01:55, Denis Magda <dm...@gridgain.com>:
>
>> Another data/index corruption issue:
>>
>> https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance
>>
>> It's suggested to clean index.bin to be able to recover the cluster.
>> Folks,
>> let's prepare a list of actions to do if a cluster becomes unrecoverable
>> due to data or index corruption issue. What should we do depending on an
>> exception:
>>
>>    - Remove index.bin if X or Y or Z
>>    - etc
>>
>>
>> --
>> Denis Magda
>>
>>
>> On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com> wrote:
>>
>> > Ignite SQL and memory experts,
>> >
>> > The following issue was reported on SO:
>> >
>> >
>> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
>> >
>> > The stack trace starts with the message below, more details are in that
>> > forum:
>> >
>> > [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
>> > Unexpected exception during cache update
>> > org.h2.message.DbException: General error: "class
>> >
>> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>> > Runtime failure on row: Row@75ab6623[ key: CacheKey [idHash=242632156,
>> > hash=-841684964, parentId=-8607237606486310912, hour=9,
>> > id=-8607237528489033728, date=2018-09-09 00:00:00.0], val: CacheValue
>> > [idHash=843227122, hash=-801894604, ....
>> >
>> > Let's see if it's addressed in the latest release. Also, the user asked
>> a
>> > reasonable question - how to recover? Yes, it's possible to use
>> snapshots
>> > of GridGain if they are created before but I remember some discussions
>> > around a recovery tool.
>> >
>> > --
>> > Denis
>> >
>>
>

Re: Ignite index corruption issue -> unrecoverable cluster

Posted by Dmitriy Pavlov <dp...@apache.org>.
Denis, Whan do you think about a more general idea of creating FAQs for
Ignite users?

What if experts will once place their answer in a wiki page and then
develop answers for frequent problems.

And before diving into researching each problem, experienced community
members will ask users to check the FAQ first?

Sincerely,
Dmitriy Pavlov

P.S. here is an article, Apache guides have reference to
http://www.catb.org/~esr/faqs/smart-questions.html - one from required
actions from users is to search for information.

чт, 24 янв. 2019 г. в 01:55, Denis Magda <dm...@gridgain.com>:

> Another data/index corruption issue:
>
> https://stackoverflow.com/questions/54295401/ignite-transaction-failure-not-recoverable-with-persistance
>
> It's suggested to clean index.bin to be able to recover the cluster. Folks,
> let's prepare a list of actions to do if a cluster becomes unrecoverable
> due to data or index corruption issue. What should we do depending on an
> exception:
>
>    - Remove index.bin if X or Y or Z
>    - etc
>
>
> --
> Denis Magda
>
>
> On Sun, Dec 30, 2018 at 10:06 AM Denis Magda <dm...@gridgain.com> wrote:
>
> > Ignite SQL and memory experts,
> >
> > The following issue was reported on SO:
> >
> >
> https://stackoverflow.com/questions/53979106/ignite-corruptedtreeexception-leads-to-cluster-failure
> >
> > The stack trace starts with the message below, more details are in that
> > forum:
> >
> > [SEVERE][data-streamer-stripe-2-#15][GridDhtAtomicCache] <MyCache>
> > Unexpected exception during cache update
> > org.h2.message.DbException: General error: "class
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
> > Runtime failure on row: Row@75ab6623[ key: CacheKey [idHash=242632156,
> > hash=-841684964, parentId=-8607237606486310912, hour=9,
> > id=-8607237528489033728, date=2018-09-09 00:00:00.0], val: CacheValue
> > [idHash=843227122, hash=-801894604, ....
> >
> > Let's see if it's addressed in the latest release. Also, the user asked a
> > reasonable question - how to recover? Yes, it's possible to use snapshots
> > of GridGain if they are created before but I remember some discussions
> > around a recovery tool.
> >
> > --
> > Denis
> >
>