You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Alan Gerhard <al...@GerCom.Com> on 2002/12/07 16:26:12 UTC

fault tolerance and mail loss

maxime -

>  From what I understand, the spool is like a producer/consumer queue,
> ....
>both have a small window where duplication can occur.

i have been carrying on a similar discussion over in developer's list which
probably belongs here - thus this response and slight change of topic.

the robustness of the *Processor Pipeline* is of great importance, especially to
commercial IPSs and from what I have followed your discussions to date, my gut
reaction is that 'Yes, there is a window in which the messaging services of
James can fail.' but we first must review the term and concept of 'guaranteed
messaging' and how it relates to the SMTP protocol - which unfortunately can
not, does not and will not guarantee a delivery - though it does come close.

for your scenario, the window of loss is rather small and well within the
industry expected bounds of reliability - which is all any system can hope for.

your main concern to how the spool processes the mail and how it may or may not
be lost or doubled and what the chances are of that happening are. This becomes
more of an issue between the queue and the targeted repository and any
guarantees that may exist between the two - which i don't believe there are any
though i may be mistaken. Implementing a messaging g service (JMS) between the
two or introducing robust code to ensure a low percent of mail-loss possibility
can be done though i suspect discussions of this will need to take place prior ,
etc. etc.

anyway, how my discussion falls within your is that i am concerned about
properly processing the SMTP protocol within the *Processor Pipeline* - where i
have scenario that the sender receives a positive from James when it in fact
cannot respond - positive or negative as the db is unavailable.

different situations but similar concerns.

thanks,
alan





--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by Alan Gerhard <al...@GerCom.Com>.
~I would add to the top of that list:
~
~Don't be patronizing:

it was not my intent ... 
my apologies to the list

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: fault tolerance and mail loss

Posted by Aaron Knauf <ak...@xtra.co.nz>.
> ?????
> http://careers.usatoday.com/service/usa/national/content/news/onthejob/2001-03-2
> 9-people-skills

Alan,

I would add to the top of that list:

Don't be patronising:

People /know/ you're arrogant when you patronisingly tell them to work 
on their people skills.


The people that post to this list are not the most sweetly spoken lot,
it's true.  They do, however, make a genuine and completely voluntary 
attempt to help out with the JAMES-related problems of all and sundry.

The statements made may be blunt, but they are generally on-topic. You 
wont get very far by making offensive personal remarks.

ADK


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am under some restrictions as to which version of James I am
> sanctioned to run. I am aware of the tremendous amount of work
> has been done in getting to 2.1...

> Until then I am running 20a3 and have found issues that I believed
> have not been addressed

> therefore my arrogance and chest pounding about the apparently faulty
> operation of a severed db connection.

Well, maybe table pounding, but that's about it.  As far as Danny and I can
tell, the problem doesn't exist in the current code.  But you can't run the
fixed code, so what would you have us do?  Other than get v2.1 officially
released.  :-)

As I said, in the future I hope that we can fairly rapidly get out new
release versions in response to defects.

	--- Noel


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by Alan Gerhard <al...@GerCom.Com>.
Noel -

~If you can demonstrate that James v2.1 provides a 250 return
~code to the
~sender when using a JDBC spool and a failed database (and where the JDBC
~driver is providing failure notification), then we have a
~serious defect to fix.

I am sorry - I cannot do that.
I am under some restrictions as to which version of James I am sanctioned to
run. I am aware of the tremendous amount of work has been done in getting to 2.1
...

Unfortunately I cannot 'officially' install and run James 2.1 until it has been
'officially released' by the James Group. Until then I am running 20a3 and have
found issues that I believed have not been addressed - therefore my arrogance
and chest pounding about the apparently faulty operation of a severed db
connection.

it's been raised and the issues - if any - that need addressing are being
attended to.

alan





--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by "Noel J. Bergman" <no...@devtech.com>.
Alan,

I have been listening, and I certainly know your name.

Personally, I haven't used 2.0a3 since June.  By my automated count, there
have been ~600 patches made to James since 2.0a3.  I am bothered that there
have been critical fixes that have not been part of a Release Build.  It is
my goal that when a defect is located in the future, that we can quickly put
out a fixed Release, as we just did to fix a resource leak impacting Oracle
users.

If you can demonstrate that James v2.1 provides a 250 return code to the
sender when using a JDBC spool and a failed database (and where the JDBC
driver is providing failure notification), then we have a serious defect to
fix.  I cannot produce such an error.

	--- Noel


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by Alan Gerhard <al...@GerCom.Com>.
~into the spool is ever removed from it. That happens only if a mailet sets
~the status to GHOST, which happens after the message is successfully
~transferred to another store, or a determination is made by a mailet to discard
it.
yes, agreed, understood, no loss; duplication; two coins.



~I see nothing in your messages to james-dev that adds any additional
~information.  Are you saying that the JDBC driver you are using reports
~success even if there is a failure?
?????
http://careers.usatoday.com/service/usa/national/content/news/onthejob/2001-03-2
9-people-skills


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by "Noel J. Bergman" <no...@devtech.com>.
> The latest stable version of James, as stated (20022106),
> does not notify the sender at all when the db is down and
> James cannot process the mail.

Does 20022106 translate as June 21, 2002?  What stable release build are you
talking about?  James 2.0a3?  If so, I can believe it.  I fixed quite a few
errors in the handling of exceptions.

To be sure, I just re-ran a test where I stopped the database during a
postal test.  Postal immediately started receiving 451 errors:

  Server error:451 Error processing message:
                   Exception spooling message:
                   Exception caught while storing mail
                   Container: java.sql.SQLException:
                   Communication link failure:
                   java.net.SocketException;

The 25NOV2002 code in the Milestone directory should be what gets released.
The only remaining work items for the official release of James v2.1 are
related to cleaning up the documentation, and a formal release vote.

	--- Noel


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by Danny Angus <da...@apache.org>.
> The latest stable version of James, as stated (20022106), does 
> not notify the
> sender at all when the db is down and James cannot process the mail.
> 
> ALL MAIL SENT TO JAMES WHEN IN THIS STATE IS LOST AND WITHOUT NOTIFICATION

According to my tests this isn't the case could you confirm this and send a trascript of an SMTP session and/or logfiles, so we can investigate further.

d.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by Alan Gerhard <al...@GerCom.Com>.
~> In point of fact, if the database is unavailable when the handler
~> receives e-mail from the sender, it will return an error to
~the sender.

~> no, this is incorrect as i have pointed out in the dev list.

~As has been demonstrated to you, if the database is down, and the JDBC
~driver reports the error, a 451 error is returned to the
~sender.  That's working in James already.  What you were told is that there are
planned
~improvements so that James will reacquire connections within
~requiring a restart after a db server crash.  That does not amount to a
~window for data loss.

Noel -
The latest stable version of James, as stated (20022106), does not notify the
sender at all when the db is down and James cannot process the mail.

ALL MAIL SENT TO JAMES WHEN IN THIS STATE IS LOST AND WITHOUT NOTIFICATION


alan


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by "Noel J. Bergman" <no...@devtech.com>.
> we are not running a guaranteed message deliver service within James ...
> there is a window - however small - of either loss or duplicates;
> depending on how the transfer validation was implemented.

I disagree that loss and duplication are two heads of the same coin,
semantically speaking.

> i may be anal about this but SMTP is not a guaranteed delivery mechanism
and
> neither is the internal spool/queue that we are using.

Read the code.  Line 321 of JamesSpoolManager.java is the only place in all
of James where a message that has been successfully inserted into the spool
is ever removed from it.  That happens only if a mailet sets the status to
GHOST, which happens after the message is successfully transfered to another
store, or a determination is made by a mailet to discard it.

As I have said, once the message goes into the spool it remains in the spool
until it has been successfully stored elsewhere or requested to be
destroyed.

> the window of loss/duplication is very small - but there IS a window.
> we can not pretend that it doesn't exist

A window for duplication, yes.  Show me the window for loss.  You will have
to posit a defect in the code, not a window, because removal from the spool
does not occur until afterwards.

> In point of fact, if the database is unavailable when the handler
> receives e-mail from the sender, it will return an error to the sender.
> no, this is incorrect as i have pointed out in the dev list.

As has been demonstrated to you, if the database is down, and the JDBC
driver reports the error, a 451 error is returned to the sender.  That's
working in James already.  What you were told is that there are planned
improvements so that James will reacquire connections within requiring a
restart after a db server crash.  That does not amount to a window for data
loss.

I see nothing in your messages to james-dev that adds any additional
information.  Are you saying that the JDBC driver you are using reports
success even if there is a failure?

	--- Noel


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by Alan Gerhard <al...@GerCom.Com>.
~From: Noel J. Bergman
~ There is a window within which to PREVENT loss, there is the remote
~possibility of a duplicate transaction.
we are not running a guaranteed message deliver service within James ... there
is a window - however small - of either loss or duplicates; depending on how the
transfer validation was implemented.
i may be anal about this but SMTP is not a guaranteed delivery mechanism and
neither is the internal spool/queue that we are using. for all intents and
purposes, we can rest assured that the window of loss/duplication is very
small - but there IS a window. we can not pretend that it doesn't exist or that
SMTP is a guaranteed message delivery system; that's just plain foolish.

~> i have scenario that the sender receives a positive from James
~> when it in fact cannot respond - positive or negative as the
~> db is unavailable.
~In point of fact, if the database is unavailable when the
~handler receives e-mail from the sender, it will return an error to the sender.
no, this is incorrect as i have pointed out in the dev list. i will need to
upgrade to the next official release or run several unsanctioned patches to
correct this - but for a commercial ISP, it cannot be expected to run a
development version. as far as i am concerned, this is a problem that is being
addressed - but it does exist.

alan


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: fault tolerance and mail loss

Posted by "Noel J. Bergman" <no...@devtech.com>.
Alan,

Read the thread again.  There is no window of LOSS for Maxime.  There is a
window within which to PREVENT loss, there is the remote possibility of a
duplicate transaction.

> i have scenario that the sender receives a positive from James
> when it in fact cannot respond - positive or negative as the
> db is unavailable.

In point of fact, if the database is unavailable when the handler receives
e-mail from the sender, it will return an error to the sender.  I have had
this occur, as has Danny in his testing.

Once a message is in the spool it is not removed until it has been
successfully transfered again.  If the mail could not be delivered to a
mailbox because of a database server failure, it should remain in the spool
or be moved to the error respository.  I'd have to check that part of the
pipeline's error handling.  At the moment, I'd recommend that the error
repository use the file:// protocol.

	--- Noel


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>