You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Saulo Carvalho <sa...@gmail.com> on 2021/11/24 13:58:23 UTC

Re: Artemis 2.18.0 is filling all the disc with oldreplica files

Hi!

Is there some news about this issue?

https://issues.apache.org/jira/browse/ARTEMIS-3545

If you need any more info, please ask me.

Em sex., 29 de out. de 2021 às 12:10, Saulo Carvalho <sa...@gmail.com>
escreveu:

> Thanks for your help Gary.
>
> My test was something like this:
>
> We have a primary and a replica with Artemis.
> I started an app producing messages on Artemis.
> When we got something around 30k messages and without stop producing we
> killed the replica and the primary started to create oldreplica files until
> fill all the disc (we tried with 20GB and 80GB discs).
> I did this test on 2.18, 2.19 and had the same behavior but on 2.17 it
> doesn't happen.
>
> Issue created:
> https://issues.apache.org/jira/browse/ARTEMIS-3545
>
> Em sex., 29 de out. de 2021 às 05:19, Gary Tully <ga...@gmail.com>
> escreveu:
>
>> if there is some change of behaviour between 17 and 18 then raise an
>> issue.
>> I can only think that maybe compaction is in play on the primary and
>> for some reason not on the replica, but the set of in use journals
>> should be matched.
>> if there was a way to easily reproduce in a test case that wold be a
>> great help to understand what is going on.
>>
>> On Thu, 28 Oct 2021 at 15:05, Saulo Carvalho <sa...@gmail.com> wrote:
>> >
>> > Hi! Anyone knows what can we do in this case?
>> >
>> > Em ter., 26 de out. de 2021 às 18:29, Saulo Carvalho <saulocn@gmail.com
>> >
>> > escreveu:
>> >
>> > > I made the same test on 2.17 and there's no problem. Maybe something
>> on
>> > > 2.18?
>> > >
>> > > Em ter., 26 de out. de 2021 às 17:55, Saulo Carvalho <
>> saulocn@gmail.com>
>> > > escreveu:
>> > >
>> > >> Is there some way to limit it? Or maybe some way to not use all the
>> disc?
>> > >> Cause Artemis is crashing when it's full...
>> > >> And even... Why the files is that big?
>> > >>
>> > >> Em seg., 25 de out. de 2021 às 12:03, Saulo Carvalho <
>> saulocn@gmail.com>
>> > >> escreveu:
>> > >>
>> > >>> If there is a limit to oldreplica files, we could determine the
>> size of
>> > >>> disk we should use.
>> > >>>
>> > >>> Em seg., 25 de out. de 2021 às 09:49, Saulo Carvalho <
>> saulocn@gmail.com>
>> > >>> escreveu:
>> > >>>
>> > >>>> "When you say, "we had something like 110MB ~ 300MB of messages,"
>> how
>> > >>>> are
>> > >>>> you measuring this? Are you counting *all* the data in the data
>> > >>>> directory
>> > >>>> (e.g. large messages, paging, etc.)?"
>> > >>>>
>> > >>>> Yes. I'm counting all the data cause i check all the disc. When
>> Artemis
>> > >>>> restart, it's cleaning oldreplica files and  turns into
>> 110MB~300MB.
>> > >>>>
>> > >>>> "Also, when you, "it generated 20 GB or 80 GB on oldreplica," how
>> are
>> > >>>> you
>> > >>>> measuring this? Can you provide a listing of the files?"
>> > >>>>
>> > >>>> Cause it's filling all the disc. We improve to 80GB and happened
>> again.
>> > >>>>
>> > >>>> "Can you provide steps to reproduce the behavior you're observing?"
>> > >>>>
>> > >>>> Sure! We start main and replica and start producing. When we have
>> > >>>> something like 30k messages and without stopping the producer, we
>> kill the
>> > >>>> replica. When we do it, Artemis fill all the disc with oldreplica
>> files.
>> > >>>> When we changed the max-replica-files to 0 as you said, it's
>> removing
>> > >>>> when restart but with the default value the cleaning has to be
>> manual.
>> > >>>>
>> > >>>> Thanks for your help!
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Em sáb., 23 de out. de 2021 às 09:12, Justin Bertram <
>> > >>>> jbertram@apache.org> escreveu:
>> > >>>>
>> > >>>>> When you say, "we had something like 110MB ~ 300MB of messages,"
>> how
>> > >>>>> are
>> > >>>>> you measuring this? Are you counting *all* the data in the data
>> > >>>>> directory
>> > >>>>> (e.g. large messages, paging, etc.)?
>> > >>>>>
>> > >>>>> Also, when you, "it generated 20 GB or 80 GB on oldreplica," how
>> are
>> > >>>>> you
>> > >>>>> measuring this? Can you provide a listing of the files?
>> > >>>>>
>> > >>>>> Can you provide steps to reproduce the behavior you're observing?
>> > >>>>>
>> > >>>>>
>> > >>>>> Justin
>> > >>>>>
>> > >>>>> On Fri, Oct 22, 2021 at 8:08 AM Saulo Carvalho <saulocn@gmail.com
>> >
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>> > Hi! Is there some explanation for this size of backup? Maybe
>> some
>> > >>>>> limit.
>> > >>>>> >
>> > >>>>> > Em ter., 19 de out. de 2021 às 11:31, Saulo Carvalho <
>> > >>>>> saulocn@gmail.com>
>> > >>>>> > escreveu:
>> > >>>>> >
>> > >>>>> > > Hi Gary!
>> > >>>>> > >
>> > >>>>> > > Thanks for the explanation. But why is so bigger than the
>> journal
>> > >>>>> size?
>> > >>>>> > > Sometimes we had something like 110MB ~ 300MB of messages and
>> it
>> > >>>>> > generated
>> > >>>>> > > 20 GB or 80 GB on oldreplica. Is it normal?
>> > >>>>> > > We decided to rollback to 2.17 when we hadn't this problem and
>> > >>>>> test 2.19.
>> > >>>>> > >
>> > >>>>> > >
>> > >>>>> > > Em ter., 19 de out. de 2021 às 06:20, Gary Tully <
>> > >>>>> gary.tully@gmail.com>
>> > >>>>> > > escreveu:
>> > >>>>> > >
>> > >>>>> > >> use max-saved-replicated-journals-size=0 may be your best
>> option.
>> > >>>>> > >>
>> > >>>>> > >> first: understand the value in
>> max-saved-replicated-journals-size
>> > >>>>> > 0
>> > >>>>> > >>
>> > >>>>> > >> I can be used to limit data loss in the case of catastrophic
>> > >>>>> failure.
>> > >>>>> > >> One scenario where it can be useful is as follows:
>> > >>>>> > >> A primary is active, a backup begins to replicate and fails
>> during
>> > >>>>> > >> initial synchronisation at say, 90% complete.
>> > >>>>> > >> The backup restart and saves the partial replica, then
>> starts to
>> > >>>>> > >> replicate again (from the beginning), it gets to 10% and the
>> > >>>>> primary
>> > >>>>> > >> goes on fire, all data from the primary machine is lost!
>> > >>>>> > >> Now you have a backup with a 10% replica and an old replica
>> with
>> > >>>>> 90%
>> > >>>>> > >> of the data. There is less data loss if the backup is
>> activated
>> > >>>>> with
>> > >>>>> > >> the saved old 90% replica rather than the current 10%
>> replica.
>> > >>>>> > >>
>> > >>>>> > >> it is a trade off of disk usage against the possibility of
>> > >>>>> recovering
>> > >>>>> > >> from a total failure of the primary during replication.
>> > >>>>> > >>
>> > >>>>> > >> /gary
>> > >>>>> > >>
>> > >>>>> > >> On Mon, 18 Oct 2021 at 14:17, Saulo Carvalho <
>> saulocn@gmail.com>
>> > >>>>> wrote:
>> > >>>>> > >> >
>> > >>>>> > >> > Hi Justin and folks.
>> > >>>>> > >> >
>> > >>>>> > >> > I tried this configuration and the Artemis seems to be more
>> > >>>>> resilient
>> > >>>>> > >> (when
>> > >>>>> > >> > restart, it delete these files), but stil creates
>> oldreplica
>> > >>>>> files and
>> > >>>>> > >> fill
>> > >>>>> > >> > all the disc.
>> > >>>>> > >> > Is there something more for it does not create thes files
>> or
>> > >>>>> limit
>> > >>>>> > these
>> > >>>>> > >> > creation? Maybe in the new version?!
>> > >>>>> > >> >
>> > >>>>> > >> > Thanks for your help!
>> > >>>>> > >> >
>> > >>>>> > >> >
>> > >>>>> > >> > Em qui., 14 de out. de 2021 às 18:34, Saulo Carvalho <
>> > >>>>> > saulocn@gmail.com
>> > >>>>> > >> >
>> > >>>>> > >> > escreveu:
>> > >>>>> > >> >
>> > >>>>> > >> > > Thanks Justin! I will try this!
>> > >>>>> > >> > >
>> > >>>>> > >> > > Em qui., 14 de out. de 2021 às 17:55, Justin Bertram <
>> > >>>>> > >> jbertram@apache.org>
>> > >>>>> > >> > > escreveu:
>> > >>>>> > >> > >
>> > >>>>> > >> > >> I referred you to the max-saved-replicated-journals-size
>> > >>>>> > >> configuration
>> > >>>>> > >> > >> parameter on the Jira you opened for this [1]. Have you
>> > >>>>> adjusted
>> > >>>>> > this
>> > >>>>> > >> > >> parameter according to your use-case?
>> > >>>>> > >> > >>
>> > >>>>> > >> > >> For what it's worth, I wouldn't expect 20GB to matter
>> that
>> > >>>>> much.
>> > >>>>> > Disk
>> > >>>>> > >> > >> space
>> > >>>>> > >> > >> these days is super cheap. Perhaps you're running in
>> some
>> > >>>>> kind of
>> > >>>>> > >> > >> especially constrained environment?
>> > >>>>> > >> > >>
>> > >>>>> > >> > >>
>> > >>>>> > >> > >> Justin
>> > >>>>> > >> > >>
>> > >>>>> > >> > >> [1] https://issues.apache.org/jira/browse/ARTEMIS-3527
>> > >>>>> > >> > >>
>> > >>>>> > >> > >> On Thu, Oct 14, 2021 at 12:44 PM Saulo Carvalho <
>> > >>>>> saulocn@gmail.com
>> > >>>>> > >
>> > >>>>> > >> > >> wrote:
>> > >>>>> > >> > >>
>> > >>>>> > >> > >> > Hi!
>> > >>>>> > >> > >> > Can anyone help me with this problem?
>> > >>>>> > >> > >> >
>> > >>>>> > >> > >> > Artemis in version 2.18.0 is creating oldreplica
>> files and
>> > >>>>> it's
>> > >>>>> > >> filling
>> > >>>>> > >> > >> all
>> > >>>>> > >> > >> > the disc. Is there some flag to turn it off or some
>> way to
>> > >>>>> limit
>> > >>>>> > >> these
>> > >>>>> > >> > >> > files? We have 1 master and 1 replica in our cluster.
>> > >>>>> > >> > >> > Thanks!
>> > >>>>> > >> > >> > --
>> > >>>>> > >> > >> > Saulo de Carvalho Neto
>> > >>>>> > >> > >> > Software Engineer
>> > >>>>> > >> > >> >
>> > >>>>> > >> > >>
>> > >>>>> > >> > >
>> > >>>>> > >> > >
>> > >>>>> > >> > > --
>> > >>>>> > >> > > Saulo de Carvalho Neto
>> > >>>>> > >> > > Software Engineer
>> > >>>>> > >> > >
>> > >>>>> > >> >
>> > >>>>> > >> >
>> > >>>>> > >> > --
>> > >>>>> > >> > Saulo de Carvalho Neto
>> > >>>>> > >> > Software Engineer
>> > >>>>> > >>
>> > >>>>> > >
>> > >>>>> > >
>> > >>>>> > > --
>> > >>>>> > > Saulo de Carvalho Neto
>> > >>>>> > > Software Engineer
>> > >>>>> > >
>> > >>>>> >
>> > >>>>> >
>> > >>>>> > --
>> > >>>>> > Saulo de Carvalho Neto
>> > >>>>> > Software Engineer
>> > >>>>> >
>> > >>>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Saulo de Carvalho Neto
>> > >>>> Software Engineer
>> > >>>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Saulo de Carvalho Neto
>> > >>> Software Engineer
>> > >>>
>> > >>
>> > >>
>> > >> --
>> > >> Saulo de Carvalho Neto
>> > >> Software Engineer
>> > >>
>> > >
>> > >
>> > > --
>> > > Saulo de Carvalho Neto
>> > > Software Engineer
>> > >
>> >
>> >
>> > --
>> > Saulo de Carvalho Neto
>> > Software Engineer
>>
>
>
> --
> Saulo de Carvalho Neto
> Software Engineer
>


-- 
Saulo de Carvalho Neto
Software Engineer