You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ponymail.apache.org by sebb <se...@gmail.com> on 2017/09/02 10:36:45 UTC

MTA variability (was: incubator-ponymail git commit: crop out trailing whitespace for redundant archiver)

On 2 September 2017 at 09:02,  <hu...@apache.org> wrote:
> Repository: incubator-ponymail
> Updated Branches:
>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>
> This deals with spurious whitespace that can exist on
> clustered setups due to corrections inside the MTAs.

If MTAs can change trailing whitespace, will this affect the raw source?

I think it's vital that the source agrees with the input.
That will not be the case if a cluster is set up so different MTAs
behave differently with the same input.

Re: MTA variability (was: incubator-ponymail git commit: crop out trailing whitespace for redundant archiver)

Posted by sebb <se...@gmail.com>.
On 3 September 2017 at 10:28, Daniel Gruno <hu...@apache.org> wrote:
> On 09/03/2017 11:21 AM, sebb wrote:
>> On 3 September 2017 at 07:34, Daniel Gruno <hu...@apache.org> wrote:
>>> On 09/02/2017 12:36 PM, sebb wrote:
>>>> On 2 September 2017 at 09:02,  <hu...@apache.org> wrote:
>>>>> Repository: incubator-ponymail
>>>>> Updated Branches:
>>>>>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>>>>>
>>>>> This deals with spurious whitespace that can exist on
>>>>> clustered setups due to corrections inside the MTAs.
>>>>
>>>> If MTAs can change trailing whitespace, will this affect the raw source?
>>>>
>>>> I think it's vital that the source agrees with the input.
>>>> That will not be the case if a cluster is set up so different MTAs
>>>> behave differently with the same input.
>>>>
>>>
>>> The raw source will be whatever is archived last. If the last copy to be
>>> archived has adjusted newline at the end, then that's what the mbox
>>> source will then have. If both have the redundant generator enabled,
>>> they will provide the same ID for the email (regardless of any
>>> whitespace added), which is what we first and foremost want here.
>>>
>>> Excess whitespace should really not cause duplicates here, there is no
>>> sense in doing so for anyones sake.
>>
>> But the question remains: why to the MTAs behave differently with
>> identical input?
>>
>> lists.a.o only has one subscription to each list, so can only receive
>> a single message.
>> At some point the message must be duplicated in order to feed all the
>> nodes in the cluster.
>> Either that duplication process is not preserving whitespace exactly,
>> or there are differences in the processing done by different members
>> of the cluster.
>> Neither seems right.
>> PonyMail should not need to allow for such external implementation issues
>>
>
> In a perfect world, email should stay the same, but it doesn't. headers
> get added, and sometimes newlines for some reason. There's nothing
> special about the setup we use. it simply sends an email to multiple
> targets. The exact whys of the newline is not something I know about,
> nor know how to fix in MTAs, and I don't know how many MTAs do this. The
> easiest path is to simply tell Pony Mail in these cases to ignore
> trailing whitespace when computing an ID.

Sorry, but you don't seem to understand my point.

If you send the email to multiple targets, I would expect all the
targets to get the same email with the same EOLs.
I also expect the MTAs to process identical mails in identical ways.

Why is this not happening?
Where is the difference occuring?

Note that in at least some of the duplicated messages that you said
were caused by this issue, one of the copies has an archived-at header
and the other does not.
That must be caused by something other than EOL processing issues, and
points to a problem with the cluster setup.
If this problem is fixed, maybe the need to ignore EOLs will disappear.

I am also unhappy with adding hack to fix something whose cause is unknown.

Re: MTA variability (was: incubator-ponymail git commit: crop out trailing whitespace for redundant archiver)

Posted by Daniel Gruno <hu...@apache.org>.
On 09/03/2017 11:21 AM, sebb wrote:
> On 3 September 2017 at 07:34, Daniel Gruno <hu...@apache.org> wrote:
>> On 09/02/2017 12:36 PM, sebb wrote:
>>> On 2 September 2017 at 09:02,  <hu...@apache.org> wrote:
>>>> Repository: incubator-ponymail
>>>> Updated Branches:
>>>>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>>>>
>>>> This deals with spurious whitespace that can exist on
>>>> clustered setups due to corrections inside the MTAs.
>>>
>>> If MTAs can change trailing whitespace, will this affect the raw source?
>>>
>>> I think it's vital that the source agrees with the input.
>>> That will not be the case if a cluster is set up so different MTAs
>>> behave differently with the same input.
>>>
>>
>> The raw source will be whatever is archived last. If the last copy to be
>> archived has adjusted newline at the end, then that's what the mbox
>> source will then have. If both have the redundant generator enabled,
>> they will provide the same ID for the email (regardless of any
>> whitespace added), which is what we first and foremost want here.
>>
>> Excess whitespace should really not cause duplicates here, there is no
>> sense in doing so for anyones sake.
> 
> But the question remains: why to the MTAs behave differently with
> identical input?
> 
> lists.a.o only has one subscription to each list, so can only receive
> a single message.
> At some point the message must be duplicated in order to feed all the
> nodes in the cluster.
> Either that duplication process is not preserving whitespace exactly,
> or there are differences in the processing done by different members
> of the cluster.
> Neither seems right.
> PonyMail should not need to allow for such external implementation issues
> 

In a perfect world, email should stay the same, but it doesn't. headers
get added, and sometimes newlines for some reason. There's nothing
special about the setup we use. it simply sends an email to multiple
targets. The exact whys of the newline is not something I know about,
nor know how to fix in MTAs, and I don't know how many MTAs do this. The
easiest path is to simply tell Pony Mail in these cases to ignore
trailing whitespace when computing an ID.

Re: MTA variability (was: incubator-ponymail git commit: crop out trailing whitespace for redundant archiver)

Posted by sebb <se...@gmail.com>.
On 3 September 2017 at 07:34, Daniel Gruno <hu...@apache.org> wrote:
> On 09/02/2017 12:36 PM, sebb wrote:
>> On 2 September 2017 at 09:02,  <hu...@apache.org> wrote:
>>> Repository: incubator-ponymail
>>> Updated Branches:
>>>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>>>
>>> This deals with spurious whitespace that can exist on
>>> clustered setups due to corrections inside the MTAs.
>>
>> If MTAs can change trailing whitespace, will this affect the raw source?
>>
>> I think it's vital that the source agrees with the input.
>> That will not be the case if a cluster is set up so different MTAs
>> behave differently with the same input.
>>
>
> The raw source will be whatever is archived last. If the last copy to be
> archived has adjusted newline at the end, then that's what the mbox
> source will then have. If both have the redundant generator enabled,
> they will provide the same ID for the email (regardless of any
> whitespace added), which is what we first and foremost want here.
>
> Excess whitespace should really not cause duplicates here, there is no
> sense in doing so for anyones sake.

But the question remains: why to the MTAs behave differently with
identical input?

lists.a.o only has one subscription to each list, so can only receive
a single message.
At some point the message must be duplicated in order to feed all the
nodes in the cluster.
Either that duplication process is not preserving whitespace exactly,
or there are differences in the processing done by different members
of the cluster.
Neither seems right.
PonyMail should not need to allow for such external implementation issues

Re: MTA variability (was: incubator-ponymail git commit: crop out trailing whitespace for redundant archiver)

Posted by Daniel Gruno <hu...@apache.org>.
On 09/02/2017 12:36 PM, sebb wrote:
> On 2 September 2017 at 09:02,  <hu...@apache.org> wrote:
>> Repository: incubator-ponymail
>> Updated Branches:
>>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>>
>> This deals with spurious whitespace that can exist on
>> clustered setups due to corrections inside the MTAs.
> 
> If MTAs can change trailing whitespace, will this affect the raw source?
> 
> I think it's vital that the source agrees with the input.
> That will not be the case if a cluster is set up so different MTAs
> behave differently with the same input.
> 

The raw source will be whatever is archived last. If the last copy to be
archived has adjusted newline at the end, then that's what the mbox
source will then have. If both have the redundant generator enabled,
they will provide the same ID for the email (regardless of any
whitespace added), which is what we first and foremost want here.

Excess whitespace should really not cause duplicates here, there is no
sense in doing so for anyones sake.