You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Ilja Weis (Jira)" <se...@james.apache.org> on 2022/01/31 11:30:00 UTC

[jira] [Closed] (JAMES-3710) Restarting James while deleting using POP3 causes inconsistency

     [ https://issues.apache.org/jira/browse/JAMES-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilja Weis closed JAMES-3710.
----------------------------
    Resolution: Won't Fix

> Restarting James while deleting using POP3 causes inconsistency
> ---------------------------------------------------------------
>
>                 Key: JAMES-3710
>                 URL: https://issues.apache.org/jira/browse/JAMES-3710
>             Project: James Server
>          Issue Type: Bug
>    Affects Versions: master
>            Reporter: Ilja Weis
>            Priority: Major
>         Attachments: logs.zip
>
>
> Running James (distributed-pop3-app) and restarting it while
> it is processing a lot of POP3 sessions that are deleting messages
> causes the tables messagev3, pop3metadata and imapuidtable to be
> inconsistent, leading to the problems discussed in JAMES-3709.
> The effect after applying the changes from https://github.com/apache/james-project/pull/861 is that POP3 sessions TOPing messages that no longer exist
> but still are in pop3metadata now cause "-ERR Message (12) does not exist."
> as expected but no longer destroy the whole session.
> The interesting part is why the data becomes inconsistent. This is probably
> an edge case because we're probably going to restart James all the time
> but perhaps there's a problem somewhere that's just more likely to happen
> when restarting?
> My setup: 4 James instances in Kubernetes, S3 storage, Cassandra
> cluster, 2 datacenters with 3 nodes each, replication DC1:3,DC2:3. Relevant
> cassandra properties:
> cassandra.consistency_level.regular=QUORUM
> cassandra.consistency_level.lightweight_transaction=SERIAL
> message.read.strong.consistency=false
> message.write.strong.consistency.unsafe=false
> mailbox.read.strong.consistency=false
> mailbox.read.repair.chance=0.00
> mailbox.counters.read.repair.chance.max=0.000
> mailbox.counters.read.repair.chance.one.hundred=0.000
> What I did:
> - Send a number of mails to 200 mailboxes. After this:
> > select count(*) from messagev3;
>  count
> -------
>  16837
> > select count(*) from imapuidtable;
>  count
> -------
>  16837
> > select count(*) from pop3metadata ;
>  count
> -------
>  16837
> Then, start deleting all those messages with 40 parallel sessions.
> There are no concurrent sessions to the same account.
> While the deletes are running, restart all the james instances.
> After a moment, we have:
> > select count(*) from messagev3;
>  count
> -------
>  14669
> > select count(*) from imapuidtable;
>  count
> -------
>  14194
> > select count(*) from pop3metadata ;
>  count
> -------
>  14669
> Not sure if messagev3 is relevant here, just adding it for completeness.
> Now if I'm accessing the mailboxes, this will touch some of the
> messages that are no longer in imapuidtable which of course leads to
> -ERR Message (12) does not exist.
> and
> 16:32:48.175 [WARN ] o.a.j.p.m.DistributedMailboxAdapter - Removing cc9b9f70-7f8d-11ec-b659-0f643f8b11e7 from 8870f760-7f87-11ec-9109-5fc36fc8b18d POP3 projection for user james-032@james.testing at it is not backed by a MailboxMessage
> Running the fixPop3Inconsistencies task in such a situation would then
> clean up all the pop3metadata messages as "stale".
> I have attached the James log while deleting/restarting and after
> restarting for all 4 instances.
> What do we make of this? Is this something relevant or more like something
> that just can happen if we restart James at the wrong time?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org