You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ponymail.apache.org by sebb <se...@gmail.com> on 2017/08/19 08:46:10 UTC

Re: incubator-ponymail git commit: Make sure we have a body, or ID generation will fail.

-1

Sorry, but changing an ID generator will break some Permalinks if the
messages are ever reparsed.

It will also cause duplicate messages if the messages are ever
exported and re-imported without deleting the originals.

A possible work-round is to create a new version of the medium
generator that fixes this issue.
New installations can then use that if they wish.

But existing installations cannot just change ID generator and hope
that things will be OK


On 19 August 2017 at 07:11,  <hu...@apache.org> wrote:
> Repository: incubator-ponymail
> Updated Branches:
>   refs/heads/master be430c666 -> 58ce843d7
>
>
> Make sure we have a body, or ID generation will fail.
>
> If body is None, PM tries to encode it and fails, falling back
> on the legacy ID generator, which is not guaranteed to be
> the same across the board. Fix this by setting the body
> to an empty string if not found.
>
>
> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
> Commit: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
> Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
> Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>
> Branch: refs/heads/master
> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
> Parents: be430c6
> Author: Daniel Gruno <hu...@apache.org>
> Authored: Sat Aug 19 08:11:00 2017 +0200
> Committer: Daniel Gruno <hu...@apache.org>
> Committed: Sat Aug 19 08:11:00 2017 +0200
>
> ----------------------------------------------------------------------
>  CHANGELOG.md        | 1 +
>  tools/generators.py | 4 ++++
>  2 files changed, 5 insertions(+)
> ----------------------------------------------------------------------
>
>
> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
> ----------------------------------------------------------------------
> diff --git a/CHANGELOG.md b/CHANGELOG.md
> index 05fab5d..e180827 100644
> --- a/CHANGELOG.md
> +++ b/CHANGELOG.md
> @@ -1,4 +1,5 @@
>  ## CHANGES in 0.10:
> +- Fixed an issue with ID generation where an email body does not exist
>  - Fixed an issue where favorites could contain null entries due to missing GC (#392)
>  - Added sample configs for Pony Mail (#374)
>  - Renamed ll.py to list-lists.py (#378)
>
> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
> ----------------------------------------------------------------------
> diff --git a/tools/generators.py b/tools/generators.py
> index 2ac2ca8..d8c373d 100644
> --- a/tools/generators.py
> +++ b/tools/generators.py
> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>      attachments - list of attachments (not used)
>      """
>      # Use text body
> +    if not body: # Make sure body is not None, which will fail.
> +        body = ""
>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>      # Use List ID
>      xbody += bytes(lid, encoding='ascii')
> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>      attachments - list of attachments (uses the hashes)
>      """
>      # Use text body
> +    if not body: # Make sure body is not None, which will fail.
> +        body = ""
>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>      # Use List ID
>      xbody += bytes(lid, encoding='ascii')
>

Re: incubator-ponymail git commit: Make sure we have a body, or ID generation will fail.

Posted by sebb <se...@gmail.com>.
On 19 August 2017 at 10:25, Daniel Gruno <hu...@apache.org> wrote:
> On 08/19/2017 10:46 AM, sebb wrote:
>> -1
>>
>> Sorry, but changing an ID generator will break some Permalinks if the
>> messages are ever reparsed.
>
> I'm sorry, but I don't accept this as a valid reason for a veto.

My point is that you should not force a new ID strategy on existing
installations.
By all means add a new ID generator with the fix, but don't change the
status quo.

> The present first-chance generator is horribly broken as is, as it only
> relies on 'archived-at' (which may not exist and thus will default to
> $now which will always change!) and the list ID, meaning that
> reimporting is *already broken* for these messages if archived-at is
> either missing or not the same. If the DB gets lost or corrupted, and
> you re-import, you will already end up with new permalinks should they
> fall back to this generator due to a missing body.
>
> IMHO, the first generator also needs to change or it will not be
> reliable at all. but I have not looked at that yet.
>
> We start off on line 642 with:
>             if not msg.get('archived-at'):
>                 msg.add_header('archived-at', email.utils.formatdate())
>
> then we generate a buggy ID on line 267 with:
> mid = hashlib.sha224(str("%s-%s" % (lid,
> msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid
> if lid else "none")
>
> This means, that if either the archived-at header differs, or if it's
> not present, you'll never get the same ID unless you reimport from the
> fixed-up mbox source from ES...which you probably don't have if you're
> trying to reimport an email. Reimporting from an external mbox file or
> other source will change the ID on every single import, as archived-at
> (if it's not present) will change to whatever time it is when you import.
>
> TL;DR: reimporting is already broken for messages without a main body, I
> don't believe there's technical merit in sticking with a broken format
> for the sake of maintaining status quo. status quo is broken, and no
> matter how we fix it, IDs may change as a result on a re-import from
> older installations. That is a consequence we should accept.
>
> With regards,
> Daniel.
>
>>
>> It will also cause duplicate messages if the messages are ever
>> exported and re-imported without deleting the originals.
>>
>> A possible work-round is to create a new version of the medium
>> generator that fixes this issue.
>> New installations can then use that if they wish.
>>
>> But existing installations cannot just change ID generator and hope
>> that things will be OK
>>
>>
>> On 19 August 2017 at 07:11,  <hu...@apache.org> wrote:
>>> Repository: incubator-ponymail
>>> Updated Branches:
>>>   refs/heads/master be430c666 -> 58ce843d7
>>>
>>>
>>> Make sure we have a body, or ID generation will fail.
>>>
>>> If body is None, PM tries to encode it and fails, falling back
>>> on the legacy ID generator, which is not guaranteed to be
>>> the same across the board. Fix this by setting the body
>>> to an empty string if not found.
>>>
>>>
>>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>>> Commit: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
>>> Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
>>> Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>>>
>>> Branch: refs/heads/master
>>> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
>>> Parents: be430c6
>>> Author: Daniel Gruno <hu...@apache.org>
>>> Authored: Sat Aug 19 08:11:00 2017 +0200
>>> Committer: Daniel Gruno <hu...@apache.org>
>>> Committed: Sat Aug 19 08:11:00 2017 +0200
>>>
>>> ----------------------------------------------------------------------
>>>  CHANGELOG.md        | 1 +
>>>  tools/generators.py | 4 ++++
>>>  2 files changed, 5 insertions(+)
>>> ----------------------------------------------------------------------
>>>
>>>
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
>>> ----------------------------------------------------------------------
>>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>>> index 05fab5d..e180827 100644
>>> --- a/CHANGELOG.md
>>> +++ b/CHANGELOG.md
>>> @@ -1,4 +1,5 @@
>>>  ## CHANGES in 0.10:
>>> +- Fixed an issue with ID generation where an email body does not exist
>>>  - Fixed an issue where favorites could contain null entries due to missing GC (#392)
>>>  - Added sample configs for Pony Mail (#374)
>>>  - Renamed ll.py to list-lists.py (#378)
>>>
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
>>> ----------------------------------------------------------------------
>>> diff --git a/tools/generators.py b/tools/generators.py
>>> index 2ac2ca8..d8c373d 100644
>>> --- a/tools/generators.py
>>> +++ b/tools/generators.py
>>> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>>>      attachments - list of attachments (not used)
>>>      """
>>>      # Use text body
>>> +    if not body: # Make sure body is not None, which will fail.
>>> +        body = ""
>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>      # Use List ID
>>>      xbody += bytes(lid, encoding='ascii')
>>> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>>>      attachments - list of attachments (uses the hashes)
>>>      """
>>>      # Use text body
>>> +    if not body: # Make sure body is not None, which will fail.
>>> +        body = ""
>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>      # Use List ID
>>>      xbody += bytes(lid, encoding='ascii')
>>>
>

Re: incubator-ponymail git commit: Make sure we have a body, or ID generation will fail.

Posted by Francesco Chicchiriccò <il...@apache.org>.
Hi Daniel,
see my replies below.

Regards.

On 19-ago-17, at 12:00, Daniel Gruno humbedooh@apache.org wrote:

> FWIW, I'd be interested in hearing what others have to say.
> 
> We broke backwards compatibility in 0.9 WRT IDs, so it's not set in
> stone that every version of the archiver must be fully compatible with
> the previous one.
> 
> What we could possibly do is have a vote/discussion on whether or not
> breaking that sort of compat is permissible in cases of an error (of
> sufficient magnitude) in the generator itself.
> 
> Alternately, we set out some clear definitions of each generator and set
> a policy that those must remain fixed, then create a new generator
> whenever a bug is found and leave the old one unpatched.

It's more a new generator whenever a bug is found that makes ID generation incompatible with the current.

> This, however,
> has the potential of creating a myriad of new generators.

Even considering the risk of "myriad" new generators, I'd like the idea to leave the choice of which generator to use to each given deployment.
Possibly, having stable permanent URLs is not necessarily a must for each deployment.

> We have to weigh future operability against the potential risk of having
> N% of emails have their permalinks change on a reimport (in this case, I
> believe it to be less than 0.1% of emails affected).
> 
> In the real-world cases I deal with, the medium generator was busted.
> One has to change away from it to avoid duplicates. This means the
> archive cannot be fully imported using neither the old nor the new
> generator, even if medium was kept as is and a new generator was chosen
> for future email. A decision has to be made to break the mold and live
> with it.
> 
> What do people think?
> 
> With regards,
> Daniel.
> 
> On 08/19/2017 11:25 AM, Daniel Gruno wrote:
>> On 08/19/2017 10:46 AM, sebb wrote:
>>> -1
>>>
>>> Sorry, but changing an ID generator will break some Permalinks if the
>>> messages are ever reparsed.
>> 
>> I'm sorry, but I don't accept this as a valid reason for a veto.
>> 
>> The present first-chance generator is horribly broken as is, as it only
>> relies on 'archived-at' (which may not exist and thus will default to
>> $now which will always change!) and the list ID, meaning that
>> reimporting is *already broken* for these messages if archived-at is
>> either missing or not the same. If the DB gets lost or corrupted, and
>> you re-import, you will already end up with new permalinks should they
>> fall back to this generator due to a missing body.
>> 
>> IMHO, the first generator also needs to change or it will not be
>> reliable at all. but I have not looked at that yet.
>> 
>> We start off on line 642 with:
>>             if not msg.get('archived-at'):
>>                 msg.add_header('archived-at', email.utils.formatdate())
>> 
>> then we generate a buggy ID on line 267 with:
>> mid = hashlib.sha224(str("%s-%s" % (lid,
>> msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid
>> if lid else "none")
>> 
>> This means, that if either the archived-at header differs, or if it's
>> not present, you'll never get the same ID unless you reimport from the
>> fixed-up mbox source from ES...which you probably don't have if you're
>> trying to reimport an email. Reimporting from an external mbox file or
>> other source will change the ID on every single import, as archived-at
>> (if it's not present) will change to whatever time it is when you import.
>> 
>> TL;DR: reimporting is already broken for messages without a main body, I
>> don't believe there's technical merit in sticking with a broken format
>> for the sake of maintaining status quo. status quo is broken, and no
>> matter how we fix it, IDs may change as a result on a re-import from
>> older installations. That is a consequence we should accept.
>> 
>> With regards,
>> Daniel.
>> 
>>>
>>> It will also cause duplicate messages if the messages are ever
>>> exported and re-imported without deleting the originals.
>>>
>>> A possible work-round is to create a new version of the medium
>>> generator that fixes this issue.
>>> New installations can then use that if they wish.
>>>
>>> But existing installations cannot just change ID generator and hope
>>> that things will be OK
>>>
>>>
>>> On 19 August 2017 at 07:11,  <hu...@apache.org> wrote:
>>>> Repository: incubator-ponymail
>>>> Updated Branches:
>>>>   refs/heads/master be430c666 -> 58ce843d7
>>>>
>>>>
>>>> Make sure we have a body, or ID generation will fail.
>>>>
>>>> If body is None, PM tries to encode it and fails, falling back
>>>> on the legacy ID generator, which is not guaranteed to be
>>>> the same across the board. Fix this by setting the body
>>>> to an empty string if not found.
>>>>
>>>>
>>>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>>>> Commit:
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
>>>> Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
>>>> Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>>>>
>>>> Branch: refs/heads/master
>>>> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
>>>> Parents: be430c6
>>>> Author: Daniel Gruno <hu...@apache.org>
>>>> Authored: Sat Aug 19 08:11:00 2017 +0200
>>>> Committer: Daniel Gruno <hu...@apache.org>
>>>> Committed: Sat Aug 19 08:11:00 2017 +0200
>>>>
>>>> ----------------------------------------------------------------------
>>>>  CHANGELOG.md        | 1 +
>>>>  tools/generators.py | 4 ++++
>>>>  2 files changed, 5 insertions(+)
>>>> ----------------------------------------------------------------------
>>>>
>>>>
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
>>>> ----------------------------------------------------------------------
>>>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>>>> index 05fab5d..e180827 100644
>>>> --- a/CHANGELOG.md
>>>> +++ b/CHANGELOG.md
>>>> @@ -1,4 +1,5 @@
>>>>  ## CHANGES in 0.10:
>>>> +- Fixed an issue with ID generation where an email body does not exist
>>>>  - Fixed an issue where favorites could contain null entries due to missing GC
>>>>  (#392)
>>>>  - Added sample configs for Pony Mail (#374)
>>>>  - Renamed ll.py to list-lists.py (#378)
>>>>
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
>>>> ----------------------------------------------------------------------
>>>> diff --git a/tools/generators.py b/tools/generators.py
>>>> index 2ac2ca8..d8c373d 100644
>>>> --- a/tools/generators.py
>>>> +++ b/tools/generators.py
>>>> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>>>>      attachments - list of attachments (not used)
>>>>      """
>>>>      # Use text body
>>>> +    if not body: # Make sure body is not None, which will fail.
>>>> +        body = ""
>>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>>      # Use List ID
>>>>      xbody += bytes(lid, encoding='ascii')
>>>> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>>>>      attachments - list of attachments (uses the hashes)
>>>>      """
>>>>      # Use text body
>>>> +    if not body: # Make sure body is not None, which will fail.
>>>> +        body = ""
>>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>>      # Use List ID
>>>>      xbody += bytes(lid, encoding='ascii')
>>>>

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/

Re: incubator-ponymail git commit: Make sure we have a body, or ID generation will fail.

Posted by Jacques Le Roux <ja...@les7arts.com>.
Hi Daniel, Sebastian,

I guess of course all people would want to keep their old links.

But if the old generator is anyway not reliable in time, then a new one should exist.
The project still being in incubator, so I guess not too much used outside, that should not be a biggie outside, but maybe at the ASF?

I also like the idea of letting the user decide on which generator/s generation/s she wants to rely on.
Not sure how it would be possible to use several generations (for the user, not the code) and what it eventually means... Seems complicated, picking 
one should not.

Jacques


Le 19/08/2017 à 12:00, Daniel Gruno a écrit :
> FWIW, I'd be interested in hearing what others have to say.
>
> We broke backwards compatibility in 0.9 WRT IDs, so it's not set in
> stone that every version of the archiver must be fully compatible with
> the previous one.
>
> What we could possibly do is have a vote/discussion on whether or not
> breaking that sort of compat is permissible in cases of an error (of
> sufficient magnitude) in the generator itself.
>
> Alternately, we set out some clear definitions of each generator and set
> a policy that those must remain fixed, then create a new generator
> whenever a bug is found and leave the old one unpatched. This, however,
> has the potential of creating a myriad of new generators.
>
> We have to weigh future operability against the potential risk of having
> N% of emails have their permalinks change on a reimport (in this case, I
> believe it to be less than 0.1% of emails affected).
>
> In the real-world cases I deal with, the medium generator was busted.
> One has to change away from it to avoid duplicates. This means the
> archive cannot be fully imported using neither the old nor the new
> generator, even if medium was kept as is and a new generator was chosen
> for future email. A decision has to be made to break the mold and live
> with it.
>
> What do people think?
>
> With regards,
> Daniel.
>
> On 08/19/2017 11:25 AM, Daniel Gruno wrote:
>> On 08/19/2017 10:46 AM, sebb wrote:
>>> -1
>>>
>>> Sorry, but changing an ID generator will break some Permalinks if the
>>> messages are ever reparsed.
>> I'm sorry, but I don't accept this as a valid reason for a veto.
>>
>> The present first-chance generator is horribly broken as is, as it only
>> relies on 'archived-at' (which may not exist and thus will default to
>> $now which will always change!) and the list ID, meaning that
>> reimporting is *already broken* for these messages if archived-at is
>> either missing or not the same. If the DB gets lost or corrupted, and
>> you re-import, you will already end up with new permalinks should they
>> fall back to this generator due to a missing body.
>>
>> IMHO, the first generator also needs to change or it will not be
>> reliable at all. but I have not looked at that yet.
>>
>> We start off on line 642 with:
>>              if not msg.get('archived-at'):
>>                  msg.add_header('archived-at', email.utils.formatdate())
>>
>> then we generate a buggy ID on line 267 with:
>> mid = hashlib.sha224(str("%s-%s" % (lid,
>> msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid
>> if lid else "none")
>>
>> This means, that if either the archived-at header differs, or if it's
>> not present, you'll never get the same ID unless you reimport from the
>> fixed-up mbox source from ES...which you probably don't have if you're
>> trying to reimport an email. Reimporting from an external mbox file or
>> other source will change the ID on every single import, as archived-at
>> (if it's not present) will change to whatever time it is when you import.
>>
>> TL;DR: reimporting is already broken for messages without a main body, I
>> don't believe there's technical merit in sticking with a broken format
>> for the sake of maintaining status quo. status quo is broken, and no
>> matter how we fix it, IDs may change as a result on a re-import from
>> older installations. That is a consequence we should accept.
>>
>> With regards,
>> Daniel.
>>
>>> It will also cause duplicate messages if the messages are ever
>>> exported and re-imported without deleting the originals.
>>>
>>> A possible work-round is to create a new version of the medium
>>> generator that fixes this issue.
>>> New installations can then use that if they wish.
>>>
>>> But existing installations cannot just change ID generator and hope
>>> that things will be OK
>>>
>>>
>>> On 19 August 2017 at 07:11,  <hu...@apache.org> wrote:
>>>> Repository: incubator-ponymail
>>>> Updated Branches:
>>>>    refs/heads/master be430c666 -> 58ce843d7
>>>>
>>>>
>>>> Make sure we have a body, or ID generation will fail.
>>>>
>>>> If body is None, PM tries to encode it and fails, falling back
>>>> on the legacy ID generator, which is not guaranteed to be
>>>> the same across the board. Fix this by setting the body
>>>> to an empty string if not found.
>>>>
>>>>
>>>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>>>> Commit: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
>>>> Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
>>>> Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>>>>
>>>> Branch: refs/heads/master
>>>> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
>>>> Parents: be430c6
>>>> Author: Daniel Gruno <hu...@apache.org>
>>>> Authored: Sat Aug 19 08:11:00 2017 +0200
>>>> Committer: Daniel Gruno <hu...@apache.org>
>>>> Committed: Sat Aug 19 08:11:00 2017 +0200
>>>>
>>>> ----------------------------------------------------------------------
>>>>   CHANGELOG.md        | 1 +
>>>>   tools/generators.py | 4 ++++
>>>>   2 files changed, 5 insertions(+)
>>>> ----------------------------------------------------------------------
>>>>
>>>>
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
>>>> ----------------------------------------------------------------------
>>>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>>>> index 05fab5d..e180827 100644
>>>> --- a/CHANGELOG.md
>>>> +++ b/CHANGELOG.md
>>>> @@ -1,4 +1,5 @@
>>>>   ## CHANGES in 0.10:
>>>> +- Fixed an issue with ID generation where an email body does not exist
>>>>   - Fixed an issue where favorites could contain null entries due to missing GC (#392)
>>>>   - Added sample configs for Pony Mail (#374)
>>>>   - Renamed ll.py to list-lists.py (#378)
>>>>
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
>>>> ----------------------------------------------------------------------
>>>> diff --git a/tools/generators.py b/tools/generators.py
>>>> index 2ac2ca8..d8c373d 100644
>>>> --- a/tools/generators.py
>>>> +++ b/tools/generators.py
>>>> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>>>>       attachments - list of attachments (not used)
>>>>       """
>>>>       # Use text body
>>>> +    if not body: # Make sure body is not None, which will fail.
>>>> +        body = ""
>>>>       xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>>       # Use List ID
>>>>       xbody += bytes(lid, encoding='ascii')
>>>> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>>>>       attachments - list of attachments (uses the hashes)
>>>>       """
>>>>       # Use text body
>>>> +    if not body: # Make sure body is not None, which will fail.
>>>> +        body = ""
>>>>       xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>>       # Use List ID
>>>>       xbody += bytes(lid, encoding='ascii')
>>>>
>


Re: incubator-ponymail git commit: Make sure we have a body, or ID generation will fail.

Posted by Daniel Gruno <hu...@apache.org>.
FWIW, I'd be interested in hearing what others have to say.

We broke backwards compatibility in 0.9 WRT IDs, so it's not set in
stone that every version of the archiver must be fully compatible with
the previous one.

What we could possibly do is have a vote/discussion on whether or not
breaking that sort of compat is permissible in cases of an error (of
sufficient magnitude) in the generator itself.

Alternately, we set out some clear definitions of each generator and set
a policy that those must remain fixed, then create a new generator
whenever a bug is found and leave the old one unpatched. This, however,
has the potential of creating a myriad of new generators.

We have to weigh future operability against the potential risk of having
N% of emails have their permalinks change on a reimport (in this case, I
believe it to be less than 0.1% of emails affected).

In the real-world cases I deal with, the medium generator was busted.
One has to change away from it to avoid duplicates. This means the
archive cannot be fully imported using neither the old nor the new
generator, even if medium was kept as is and a new generator was chosen
for future email. A decision has to be made to break the mold and live
with it.

What do people think?

With regards,
Daniel.

On 08/19/2017 11:25 AM, Daniel Gruno wrote:
> On 08/19/2017 10:46 AM, sebb wrote:
>> -1
>>
>> Sorry, but changing an ID generator will break some Permalinks if the
>> messages are ever reparsed.
> 
> I'm sorry, but I don't accept this as a valid reason for a veto.
> 
> The present first-chance generator is horribly broken as is, as it only
> relies on 'archived-at' (which may not exist and thus will default to
> $now which will always change!) and the list ID, meaning that
> reimporting is *already broken* for these messages if archived-at is
> either missing or not the same. If the DB gets lost or corrupted, and
> you re-import, you will already end up with new permalinks should they
> fall back to this generator due to a missing body.
> 
> IMHO, the first generator also needs to change or it will not be
> reliable at all. but I have not looked at that yet.
> 
> We start off on line 642 with:
>             if not msg.get('archived-at'):
>                 msg.add_header('archived-at', email.utils.formatdate())
> 
> then we generate a buggy ID on line 267 with:
> mid = hashlib.sha224(str("%s-%s" % (lid,
> msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid
> if lid else "none")
> 
> This means, that if either the archived-at header differs, or if it's
> not present, you'll never get the same ID unless you reimport from the
> fixed-up mbox source from ES...which you probably don't have if you're
> trying to reimport an email. Reimporting from an external mbox file or
> other source will change the ID on every single import, as archived-at
> (if it's not present) will change to whatever time it is when you import.
> 
> TL;DR: reimporting is already broken for messages without a main body, I
> don't believe there's technical merit in sticking with a broken format
> for the sake of maintaining status quo. status quo is broken, and no
> matter how we fix it, IDs may change as a result on a re-import from
> older installations. That is a consequence we should accept.
> 
> With regards,
> Daniel.
> 
>>
>> It will also cause duplicate messages if the messages are ever
>> exported and re-imported without deleting the originals.
>>
>> A possible work-round is to create a new version of the medium
>> generator that fixes this issue.
>> New installations can then use that if they wish.
>>
>> But existing installations cannot just change ID generator and hope
>> that things will be OK
>>
>>
>> On 19 August 2017 at 07:11,  <hu...@apache.org> wrote:
>>> Repository: incubator-ponymail
>>> Updated Branches:
>>>   refs/heads/master be430c666 -> 58ce843d7
>>>
>>>
>>> Make sure we have a body, or ID generation will fail.
>>>
>>> If body is None, PM tries to encode it and fails, falling back
>>> on the legacy ID generator, which is not guaranteed to be
>>> the same across the board. Fix this by setting the body
>>> to an empty string if not found.
>>>
>>>
>>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>>> Commit: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
>>> Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
>>> Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>>>
>>> Branch: refs/heads/master
>>> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
>>> Parents: be430c6
>>> Author: Daniel Gruno <hu...@apache.org>
>>> Authored: Sat Aug 19 08:11:00 2017 +0200
>>> Committer: Daniel Gruno <hu...@apache.org>
>>> Committed: Sat Aug 19 08:11:00 2017 +0200
>>>
>>> ----------------------------------------------------------------------
>>>  CHANGELOG.md        | 1 +
>>>  tools/generators.py | 4 ++++
>>>  2 files changed, 5 insertions(+)
>>> ----------------------------------------------------------------------
>>>
>>>
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
>>> ----------------------------------------------------------------------
>>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>>> index 05fab5d..e180827 100644
>>> --- a/CHANGELOG.md
>>> +++ b/CHANGELOG.md
>>> @@ -1,4 +1,5 @@
>>>  ## CHANGES in 0.10:
>>> +- Fixed an issue with ID generation where an email body does not exist
>>>  - Fixed an issue where favorites could contain null entries due to missing GC (#392)
>>>  - Added sample configs for Pony Mail (#374)
>>>  - Renamed ll.py to list-lists.py (#378)
>>>
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
>>> ----------------------------------------------------------------------
>>> diff --git a/tools/generators.py b/tools/generators.py
>>> index 2ac2ca8..d8c373d 100644
>>> --- a/tools/generators.py
>>> +++ b/tools/generators.py
>>> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>>>      attachments - list of attachments (not used)
>>>      """
>>>      # Use text body
>>> +    if not body: # Make sure body is not None, which will fail.
>>> +        body = ""
>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>      # Use List ID
>>>      xbody += bytes(lid, encoding='ascii')
>>> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>>>      attachments - list of attachments (uses the hashes)
>>>      """
>>>      # Use text body
>>> +    if not body: # Make sure body is not None, which will fail.
>>> +        body = ""
>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>      # Use List ID
>>>      xbody += bytes(lid, encoding='ascii')
>>>
> 


Re: incubator-ponymail git commit: Make sure we have a body, or ID generation will fail.

Posted by Daniel Gruno <hu...@apache.org>.
On 08/19/2017 10:46 AM, sebb wrote:
> -1
> 
> Sorry, but changing an ID generator will break some Permalinks if the
> messages are ever reparsed.

I'm sorry, but I don't accept this as a valid reason for a veto.

The present first-chance generator is horribly broken as is, as it only
relies on 'archived-at' (which may not exist and thus will default to
$now which will always change!) and the list ID, meaning that
reimporting is *already broken* for these messages if archived-at is
either missing or not the same. If the DB gets lost or corrupted, and
you re-import, you will already end up with new permalinks should they
fall back to this generator due to a missing body.

IMHO, the first generator also needs to change or it will not be
reliable at all. but I have not looked at that yet.

We start off on line 642 with:
            if not msg.get('archived-at'):
                msg.add_header('archived-at', email.utils.formatdate())

then we generate a buggy ID on line 267 with:
mid = hashlib.sha224(str("%s-%s" % (lid,
msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid
if lid else "none")

This means, that if either the archived-at header differs, or if it's
not present, you'll never get the same ID unless you reimport from the
fixed-up mbox source from ES...which you probably don't have if you're
trying to reimport an email. Reimporting from an external mbox file or
other source will change the ID on every single import, as archived-at
(if it's not present) will change to whatever time it is when you import.

TL;DR: reimporting is already broken for messages without a main body, I
don't believe there's technical merit in sticking with a broken format
for the sake of maintaining status quo. status quo is broken, and no
matter how we fix it, IDs may change as a result on a re-import from
older installations. That is a consequence we should accept.

With regards,
Daniel.

> 
> It will also cause duplicate messages if the messages are ever
> exported and re-imported without deleting the originals.
> 
> A possible work-round is to create a new version of the medium
> generator that fixes this issue.
> New installations can then use that if they wish.
> 
> But existing installations cannot just change ID generator and hope
> that things will be OK
> 
> 
> On 19 August 2017 at 07:11,  <hu...@apache.org> wrote:
>> Repository: incubator-ponymail
>> Updated Branches:
>>   refs/heads/master be430c666 -> 58ce843d7
>>
>>
>> Make sure we have a body, or ID generation will fail.
>>
>> If body is None, PM tries to encode it and fails, falling back
>> on the legacy ID generator, which is not guaranteed to be
>> the same across the board. Fix this by setting the body
>> to an empty string if not found.
>>
>>
>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>> Commit: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
>> Tree: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
>> Diff: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>>
>> Branch: refs/heads/master
>> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
>> Parents: be430c6
>> Author: Daniel Gruno <hu...@apache.org>
>> Authored: Sat Aug 19 08:11:00 2017 +0200
>> Committer: Daniel Gruno <hu...@apache.org>
>> Committed: Sat Aug 19 08:11:00 2017 +0200
>>
>> ----------------------------------------------------------------------
>>  CHANGELOG.md        | 1 +
>>  tools/generators.py | 4 ++++
>>  2 files changed, 5 insertions(+)
>> ----------------------------------------------------------------------
>>
>>
>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
>> ----------------------------------------------------------------------
>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>> index 05fab5d..e180827 100644
>> --- a/CHANGELOG.md
>> +++ b/CHANGELOG.md
>> @@ -1,4 +1,5 @@
>>  ## CHANGES in 0.10:
>> +- Fixed an issue with ID generation where an email body does not exist
>>  - Fixed an issue where favorites could contain null entries due to missing GC (#392)
>>  - Added sample configs for Pony Mail (#374)
>>  - Renamed ll.py to list-lists.py (#378)
>>
>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
>> ----------------------------------------------------------------------
>> diff --git a/tools/generators.py b/tools/generators.py
>> index 2ac2ca8..d8c373d 100644
>> --- a/tools/generators.py
>> +++ b/tools/generators.py
>> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>>      attachments - list of attachments (not used)
>>      """
>>      # Use text body
>> +    if not body: # Make sure body is not None, which will fail.
>> +        body = ""
>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>      # Use List ID
>>      xbody += bytes(lid, encoding='ascii')
>> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>>      attachments - list of attachments (uses the hashes)
>>      """
>>      # Use text body
>> +    if not body: # Make sure body is not None, which will fail.
>> +        body = ""
>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>      # Use List ID
>>      xbody += bytes(lid, encoding='ascii')
>>