You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Ilja Weis (Jira)" <se...@james.apache.org> on 2022/01/31 11:30:00 UTC
[jira] [Closed] (JAMES-3710) Restarting James while deleting using POP3 causes inconsistency
[ https://issues.apache.org/jira/browse/JAMES-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilja Weis closed JAMES-3710.
----------------------------
Resolution: Won't Fix
> Restarting James while deleting using POP3 causes inconsistency
> ---------------------------------------------------------------
>
> Key: JAMES-3710
> URL: https://issues.apache.org/jira/browse/JAMES-3710
> Project: James Server
> Issue Type: Bug
> Affects Versions: master
> Reporter: Ilja Weis
> Priority: Major
> Attachments: logs.zip
>
>
> Running James (distributed-pop3-app) and restarting it while
> it is processing a lot of POP3 sessions that are deleting messages
> causes the tables messagev3, pop3metadata and imapuidtable to be
> inconsistent, leading to the problems discussed in JAMES-3709.
> The effect after applying the changes from https://github.com/apache/james-project/pull/861 is that POP3 sessions TOPing messages that no longer exist
> but still are in pop3metadata now cause "-ERR Message (12) does not exist."
> as expected but no longer destroy the whole session.
> The interesting part is why the data becomes inconsistent. This is probably
> an edge case because we're probably going to restart James all the time
> but perhaps there's a problem somewhere that's just more likely to happen
> when restarting?
> My setup: 4 James instances in Kubernetes, S3 storage, Cassandra
> cluster, 2 datacenters with 3 nodes each, replication DC1:3,DC2:3. Relevant
> cassandra properties:
> cassandra.consistency_level.regular=QUORUM
> cassandra.consistency_level.lightweight_transaction=SERIAL
> message.read.strong.consistency=false
> message.write.strong.consistency.unsafe=false
> mailbox.read.strong.consistency=false
> mailbox.read.repair.chance=0.00
> mailbox.counters.read.repair.chance.max=0.000
> mailbox.counters.read.repair.chance.one.hundred=0.000
> What I did:
> - Send a number of mails to 200 mailboxes. After this:
> > select count(*) from messagev3;
> count
> -------
> 16837
> > select count(*) from imapuidtable;
> count
> -------
> 16837
> > select count(*) from pop3metadata ;
> count
> -------
> 16837
> Then, start deleting all those messages with 40 parallel sessions.
> There are no concurrent sessions to the same account.
> While the deletes are running, restart all the james instances.
> After a moment, we have:
> > select count(*) from messagev3;
> count
> -------
> 14669
> > select count(*) from imapuidtable;
> count
> -------
> 14194
> > select count(*) from pop3metadata ;
> count
> -------
> 14669
> Not sure if messagev3 is relevant here, just adding it for completeness.
> Now if I'm accessing the mailboxes, this will touch some of the
> messages that are no longer in imapuidtable which of course leads to
> -ERR Message (12) does not exist.
> and
> 16:32:48.175 [WARN ] o.a.j.p.m.DistributedMailboxAdapter - Removing cc9b9f70-7f8d-11ec-b659-0f643f8b11e7 from 8870f760-7f87-11ec-9109-5fc36fc8b18d POP3 projection for user james-032@james.testing at it is not backed by a MailboxMessage
> Running the fixPop3Inconsistencies task in such a situation would then
> clean up all the pop3metadata messages as "stale".
> I have attached the James log while deleting/restarting and after
> restarting for all 4 instances.
> What do we make of this? Is this something relevant or more like something
> that just can happen if we restart James at the wrong time?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org