You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ponymail.apache.org by hu...@apache.org on 2020/09/04 11:48:57 UTC

[incubator-ponymail-foal] branch master updated: Allow for multiple ID generators to be run per email

This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git


The following commit(s) were added to refs/heads/master by this push:
     new 178b729  Allow for multiple ID generators to be run per email
178b729 is described below

commit 178b729b9084a83034c0a87f150f23fd2ca48291
Author: Daniel Gruno <hu...@apache.org>
AuthorDate: Fri Sep 4 13:48:39 2020 +0200

    Allow for multiple ID generators to be run per email
    
    If needed, this allows custom multi-id generation for emails by
    specifying more than one generator (with a space between each).
    This allows for seamless switching between short and long links.
    The first generator will be used for the document ID, and all
    generated IDs will be present in the 'permalinks' array.
---
 tools/archiver.py   | 35 +++++++++++++++++++++--------------
 tools/mappings.yaml |  2 +-
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/tools/archiver.py b/tools/archiver.py
index d447c62..6a64775 100755
--- a/tools/archiver.py
+++ b/tools/archiver.py
@@ -400,19 +400,23 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
 
         if body is not None or attachments:
             pmid = mid
-            try:
-                mid = plugins.generators.generate(
-                    self.generator, msg, body, lid, attachments, raw_msg
-                )
-            except Exception as err:
-                if logger:
-                    # N.B. use .get just in case there is no message-id
-                    logger.info(
-                        "Could not generate MID: %s. MSGID: %s",
-                        err,
-                        msg_metadata.get("message-id", "?").strip(),
-                    )
-                mid = pmid
+            all_mids = set()  # Use a set to avoid duplicates
+            for generator in self.generator.split(" "):
+                if generator:
+                    try:
+                        mid = plugins.generators.generate(
+                            generator, msg, body, lid, attachments, raw_msg
+                        )
+                    except Exception as err:
+                        if logger:
+                            # N.B. use .get just in case there is no message-id
+                            logger.info(
+                                "Could not generate MID: %s. MSGID: %s",
+                                err,
+                                msg_metadata.get("message-id", "?").strip(),
+                            )
+                        mid = pmid
+                    all_mids.add(mid)
 
             if "in-reply-to" in msg_metadata:
                 try:
@@ -425,13 +429,16 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                         irt = irt.strip()
                 except ValueError:
                     irt = ""
+            all_mids = list(all_mids)  # Convert back to list
+            document_id = all_mids[0]
             output_json = {
                 "from_raw": msg_metadata["from"],
                 "from": msg_metadata["from"],
                 "to": msg_metadata["to"],
                 "subject": msg_metadata["subject"],
                 "message-id": msg_metadata["message-id"],
-                "mid": mid,
+                "mid": document_id,
+                "permalinks": all_mids,
                 "dbid": hashlib.sha3_256(raw_msg).hexdigest(),
                 "cc": msg_metadata.get("cc"),
                 "epoch": epoch,
diff --git a/tools/mappings.yaml b/tools/mappings.yaml
index 54298d4..6743a07 100644
--- a/tools/mappings.yaml
+++ b/tools/mappings.yaml
@@ -78,7 +78,7 @@ mbox:
       type: keyword
     mid:
       type: keyword
-    permalink:
+    permalinks:
       type: keyword
     private:
       type: boolean


Re: [incubator-ponymail-foal] branch master updated: Allow for multiple ID generators to be run per email

Posted by sebb <se...@gmail.com>.
On Fri, 4 Sep 2020 at 14:52, Daniel Gruno <hu...@apache.org> wrote:
>
> On 04/09/2020 15.48, sebb wrote:
> > On Fri, 4 Sep 2020 at 12:56, Daniel Gruno <hu...@apache.org> wrote:
> >>
> >> As an FYI, this is meant as an advanced feature generally for systems
> >> where you previously had one set of generators, but you want to use a
> >> different permalink for them all, while still retaining the old document
> >> IDs.
> >>
> >> It's a stepping stone towards flexibility in your choice of permalink
> >> length with backwards compatibility, and is not complete until we have
> >> the UI backend ready. But it is an important step.
> >
> > Regardless of the UI backend, this will only work if the foal versions
> > of historic generators produce the same results as they did in
> > ponymail.
> >
> > That is not the case currently, because of changes to the parsing and
> > en/decoding.
> >
> > As I already wrote, maintaining compatibility is going to be tricky.
>
> It is indeed tricky, this is going to be something for the very advanced
> user that knows how to utilize it. But it is something I want to work on
> for those specific users :). This will be mixed with a separate tool for
> "re-indexing" older documents from old pony mail installations later
> next week probably. There will probably be something like an advanced
> users manual for this as well.

This can only be a useful feature if the Foal code generates the same
ids as the Pony code.

That is not the case at the moment.

If it were, then the Foal code would generate the same ids for the
yaml/set01 tests
And the Pony code would work with the yaml/set02 tests

There's no point enabling multiple generators if individual generators
produce incorrect results.

> >
> >>
> >> On 04/09/2020 13.48, humbedooh@apache.org wrote:
> >>> This is an automated email from the ASF dual-hosted git repository.
> >>>
> >>> humbedooh pushed a commit to branch master
> >>> in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git
> >>>
> >>>
> >>> The following commit(s) were added to refs/heads/master by this push:
> >>>        new 178b729  Allow for multiple ID generators to be run per email
> >>> 178b729 is described below
> >>>
> >>> commit 178b729b9084a83034c0a87f150f23fd2ca48291
> >>> Author: Daniel Gruno <hu...@apache.org>
> >>> AuthorDate: Fri Sep 4 13:48:39 2020 +0200
> >>>
> >>>       Allow for multiple ID generators to be run per email
> >>>
> >>>       If needed, this allows custom multi-id generation for emails by
> >>>       specifying more than one generator (with a space between each).
> >>>       This allows for seamless switching between short and long links.
> >>>       The first generator will be used for the document ID, and all
> >>>       generated IDs will be present in the 'permalinks' array.
> >>> ---
> >>>    tools/archiver.py   | 35 +++++++++++++++++++++--------------
> >>>    tools/mappings.yaml |  2 +-
> >>>    2 files changed, 22 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/tools/archiver.py b/tools/archiver.py
> >>> index d447c62..6a64775 100755
> >>> --- a/tools/archiver.py
> >>> +++ b/tools/archiver.py
> >>> @@ -400,19 +400,23 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
> >>>
> >>>            if body is not None or attachments:
> >>>                pmid = mid
> >>> -            try:
> >>> -                mid = plugins.generators.generate(
> >>> -                    self.generator, msg, body, lid, attachments, raw_msg
> >>> -                )
> >>> -            except Exception as err:
> >>> -                if logger:
> >>> -                    # N.B. use .get just in case there is no message-id
> >>> -                    logger.info(
> >>> -                        "Could not generate MID: %s. MSGID: %s",
> >>> -                        err,
> >>> -                        msg_metadata.get("message-id", "?").strip(),
> >>> -                    )
> >>> -                mid = pmid
> >>> +            all_mids = set()  # Use a set to avoid duplicates
> >>> +            for generator in self.generator.split(" "):
> >>> +                if generator:
> >>> +                    try:
> >>> +                        mid = plugins.generators.generate(
> >>> +                            generator, msg, body, lid, attachments, raw_msg
> >>> +                        )
> >>> +                    except Exception as err:
> >>> +                        if logger:
> >>> +                            # N.B. use .get just in case there is no message-id
> >>> +                            logger.info(
> >>> +                                "Could not generate MID: %s. MSGID: %s",
> >>> +                                err,
> >>> +                                msg_metadata.get("message-id", "?").strip(),
> >>> +                            )
> >>> +                        mid = pmid
> >>> +                    all_mids.add(mid)
> >>>
> >>>                if "in-reply-to" in msg_metadata:
> >>>                    try:
> >>> @@ -425,13 +429,16 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
> >>>                            irt = irt.strip()
> >>>                    except ValueError:
> >>>                        irt = ""
> >>> +            all_mids = list(all_mids)  # Convert back to list
> >>> +            document_id = all_mids[0]
> >>>                output_json = {
> >>>                    "from_raw": msg_metadata["from"],
> >>>                    "from": msg_metadata["from"],
> >>>                    "to": msg_metadata["to"],
> >>>                    "subject": msg_metadata["subject"],
> >>>                    "message-id": msg_metadata["message-id"],
> >>> -                "mid": mid,
> >>> +                "mid": document_id,
> >>> +                "permalinks": all_mids,
> >>>                    "dbid": hashlib.sha3_256(raw_msg).hexdigest(),
> >>>                    "cc": msg_metadata.get("cc"),
> >>>                    "epoch": epoch,
> >>> diff --git a/tools/mappings.yaml b/tools/mappings.yaml
> >>> index 54298d4..6743a07 100644
> >>> --- a/tools/mappings.yaml
> >>> +++ b/tools/mappings.yaml
> >>> @@ -78,7 +78,7 @@ mbox:
> >>>          type: keyword
> >>>        mid:
> >>>          type: keyword
> >>> -    permalink:
> >>> +    permalinks:
> >>>          type: keyword
> >>>        private:
> >>>          type: boolean
> >>>
> >>
>

Re: [incubator-ponymail-foal] branch master updated: Allow for multiple ID generators to be run per email

Posted by Daniel Gruno <hu...@apache.org>.
On 04/09/2020 15.48, sebb wrote:
> On Fri, 4 Sep 2020 at 12:56, Daniel Gruno <hu...@apache.org> wrote:
>>
>> As an FYI, this is meant as an advanced feature generally for systems
>> where you previously had one set of generators, but you want to use a
>> different permalink for them all, while still retaining the old document
>> IDs.
>>
>> It's a stepping stone towards flexibility in your choice of permalink
>> length with backwards compatibility, and is not complete until we have
>> the UI backend ready. But it is an important step.
> 
> Regardless of the UI backend, this will only work if the foal versions
> of historic generators produce the same results as they did in
> ponymail.
> 
> That is not the case currently, because of changes to the parsing and
> en/decoding.
> 
> As I already wrote, maintaining compatibility is going to be tricky.

It is indeed tricky, this is going to be something for the very advanced 
user that knows how to utilize it. But it is something I want to work on 
for those specific users :). This will be mixed with a separate tool for 
"re-indexing" older documents from old pony mail installations later 
next week probably. There will probably be something like an advanced 
users manual for this as well.

> 
>>
>> On 04/09/2020 13.48, humbedooh@apache.org wrote:
>>> This is an automated email from the ASF dual-hosted git repository.
>>>
>>> humbedooh pushed a commit to branch master
>>> in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git
>>>
>>>
>>> The following commit(s) were added to refs/heads/master by this push:
>>>        new 178b729  Allow for multiple ID generators to be run per email
>>> 178b729 is described below
>>>
>>> commit 178b729b9084a83034c0a87f150f23fd2ca48291
>>> Author: Daniel Gruno <hu...@apache.org>
>>> AuthorDate: Fri Sep 4 13:48:39 2020 +0200
>>>
>>>       Allow for multiple ID generators to be run per email
>>>
>>>       If needed, this allows custom multi-id generation for emails by
>>>       specifying more than one generator (with a space between each).
>>>       This allows for seamless switching between short and long links.
>>>       The first generator will be used for the document ID, and all
>>>       generated IDs will be present in the 'permalinks' array.
>>> ---
>>>    tools/archiver.py   | 35 +++++++++++++++++++++--------------
>>>    tools/mappings.yaml |  2 +-
>>>    2 files changed, 22 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/tools/archiver.py b/tools/archiver.py
>>> index d447c62..6a64775 100755
>>> --- a/tools/archiver.py
>>> +++ b/tools/archiver.py
>>> @@ -400,19 +400,23 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
>>>
>>>            if body is not None or attachments:
>>>                pmid = mid
>>> -            try:
>>> -                mid = plugins.generators.generate(
>>> -                    self.generator, msg, body, lid, attachments, raw_msg
>>> -                )
>>> -            except Exception as err:
>>> -                if logger:
>>> -                    # N.B. use .get just in case there is no message-id
>>> -                    logger.info(
>>> -                        "Could not generate MID: %s. MSGID: %s",
>>> -                        err,
>>> -                        msg_metadata.get("message-id", "?").strip(),
>>> -                    )
>>> -                mid = pmid
>>> +            all_mids = set()  # Use a set to avoid duplicates
>>> +            for generator in self.generator.split(" "):
>>> +                if generator:
>>> +                    try:
>>> +                        mid = plugins.generators.generate(
>>> +                            generator, msg, body, lid, attachments, raw_msg
>>> +                        )
>>> +                    except Exception as err:
>>> +                        if logger:
>>> +                            # N.B. use .get just in case there is no message-id
>>> +                            logger.info(
>>> +                                "Could not generate MID: %s. MSGID: %s",
>>> +                                err,
>>> +                                msg_metadata.get("message-id", "?").strip(),
>>> +                            )
>>> +                        mid = pmid
>>> +                    all_mids.add(mid)
>>>
>>>                if "in-reply-to" in msg_metadata:
>>>                    try:
>>> @@ -425,13 +429,16 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
>>>                            irt = irt.strip()
>>>                    except ValueError:
>>>                        irt = ""
>>> +            all_mids = list(all_mids)  # Convert back to list
>>> +            document_id = all_mids[0]
>>>                output_json = {
>>>                    "from_raw": msg_metadata["from"],
>>>                    "from": msg_metadata["from"],
>>>                    "to": msg_metadata["to"],
>>>                    "subject": msg_metadata["subject"],
>>>                    "message-id": msg_metadata["message-id"],
>>> -                "mid": mid,
>>> +                "mid": document_id,
>>> +                "permalinks": all_mids,
>>>                    "dbid": hashlib.sha3_256(raw_msg).hexdigest(),
>>>                    "cc": msg_metadata.get("cc"),
>>>                    "epoch": epoch,
>>> diff --git a/tools/mappings.yaml b/tools/mappings.yaml
>>> index 54298d4..6743a07 100644
>>> --- a/tools/mappings.yaml
>>> +++ b/tools/mappings.yaml
>>> @@ -78,7 +78,7 @@ mbox:
>>>          type: keyword
>>>        mid:
>>>          type: keyword
>>> -    permalink:
>>> +    permalinks:
>>>          type: keyword
>>>        private:
>>>          type: boolean
>>>
>>


Re: [incubator-ponymail-foal] branch master updated: Allow for multiple ID generators to be run per email

Posted by sebb <se...@gmail.com>.
On Fri, 4 Sep 2020 at 12:56, Daniel Gruno <hu...@apache.org> wrote:
>
> As an FYI, this is meant as an advanced feature generally for systems
> where you previously had one set of generators, but you want to use a
> different permalink for them all, while still retaining the old document
> IDs.
>
> It's a stepping stone towards flexibility in your choice of permalink
> length with backwards compatibility, and is not complete until we have
> the UI backend ready. But it is an important step.

Regardless of the UI backend, this will only work if the foal versions
of historic generators produce the same results as they did in
ponymail.

That is not the case currently, because of changes to the parsing and
en/decoding.

As I already wrote, maintaining compatibility is going to be tricky.

>
> On 04/09/2020 13.48, humbedooh@apache.org wrote:
> > This is an automated email from the ASF dual-hosted git repository.
> >
> > humbedooh pushed a commit to branch master
> > in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git
> >
> >
> > The following commit(s) were added to refs/heads/master by this push:
> >       new 178b729  Allow for multiple ID generators to be run per email
> > 178b729 is described below
> >
> > commit 178b729b9084a83034c0a87f150f23fd2ca48291
> > Author: Daniel Gruno <hu...@apache.org>
> > AuthorDate: Fri Sep 4 13:48:39 2020 +0200
> >
> >      Allow for multiple ID generators to be run per email
> >
> >      If needed, this allows custom multi-id generation for emails by
> >      specifying more than one generator (with a space between each).
> >      This allows for seamless switching between short and long links.
> >      The first generator will be used for the document ID, and all
> >      generated IDs will be present in the 'permalinks' array.
> > ---
> >   tools/archiver.py   | 35 +++++++++++++++++++++--------------
> >   tools/mappings.yaml |  2 +-
> >   2 files changed, 22 insertions(+), 15 deletions(-)
> >
> > diff --git a/tools/archiver.py b/tools/archiver.py
> > index d447c62..6a64775 100755
> > --- a/tools/archiver.py
> > +++ b/tools/archiver.py
> > @@ -400,19 +400,23 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
> >
> >           if body is not None or attachments:
> >               pmid = mid
> > -            try:
> > -                mid = plugins.generators.generate(
> > -                    self.generator, msg, body, lid, attachments, raw_msg
> > -                )
> > -            except Exception as err:
> > -                if logger:
> > -                    # N.B. use .get just in case there is no message-id
> > -                    logger.info(
> > -                        "Could not generate MID: %s. MSGID: %s",
> > -                        err,
> > -                        msg_metadata.get("message-id", "?").strip(),
> > -                    )
> > -                mid = pmid
> > +            all_mids = set()  # Use a set to avoid duplicates
> > +            for generator in self.generator.split(" "):
> > +                if generator:
> > +                    try:
> > +                        mid = plugins.generators.generate(
> > +                            generator, msg, body, lid, attachments, raw_msg
> > +                        )
> > +                    except Exception as err:
> > +                        if logger:
> > +                            # N.B. use .get just in case there is no message-id
> > +                            logger.info(
> > +                                "Could not generate MID: %s. MSGID: %s",
> > +                                err,
> > +                                msg_metadata.get("message-id", "?").strip(),
> > +                            )
> > +                        mid = pmid
> > +                    all_mids.add(mid)
> >
> >               if "in-reply-to" in msg_metadata:
> >                   try:
> > @@ -425,13 +429,16 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
> >                           irt = irt.strip()
> >                   except ValueError:
> >                       irt = ""
> > +            all_mids = list(all_mids)  # Convert back to list
> > +            document_id = all_mids[0]
> >               output_json = {
> >                   "from_raw": msg_metadata["from"],
> >                   "from": msg_metadata["from"],
> >                   "to": msg_metadata["to"],
> >                   "subject": msg_metadata["subject"],
> >                   "message-id": msg_metadata["message-id"],
> > -                "mid": mid,
> > +                "mid": document_id,
> > +                "permalinks": all_mids,
> >                   "dbid": hashlib.sha3_256(raw_msg).hexdigest(),
> >                   "cc": msg_metadata.get("cc"),
> >                   "epoch": epoch,
> > diff --git a/tools/mappings.yaml b/tools/mappings.yaml
> > index 54298d4..6743a07 100644
> > --- a/tools/mappings.yaml
> > +++ b/tools/mappings.yaml
> > @@ -78,7 +78,7 @@ mbox:
> >         type: keyword
> >       mid:
> >         type: keyword
> > -    permalink:
> > +    permalinks:
> >         type: keyword
> >       private:
> >         type: boolean
> >
>

Re: [incubator-ponymail-foal] branch master updated: Allow for multiple ID generators to be run per email

Posted by Daniel Gruno <hu...@apache.org>.
As an FYI, this is meant as an advanced feature generally for systems 
where you previously had one set of generators, but you want to use a 
different permalink for them all, while still retaining the old document 
IDs.

It's a stepping stone towards flexibility in your choice of permalink 
length with backwards compatibility, and is not complete until we have 
the UI backend ready. But it is an important step.


On 04/09/2020 13.48, humbedooh@apache.org wrote:
> This is an automated email from the ASF dual-hosted git repository.
> 
> humbedooh pushed a commit to branch master
> in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git
> 
> 
> The following commit(s) were added to refs/heads/master by this push:
>       new 178b729  Allow for multiple ID generators to be run per email
> 178b729 is described below
> 
> commit 178b729b9084a83034c0a87f150f23fd2ca48291
> Author: Daniel Gruno <hu...@apache.org>
> AuthorDate: Fri Sep 4 13:48:39 2020 +0200
> 
>      Allow for multiple ID generators to be run per email
>      
>      If needed, this allows custom multi-id generation for emails by
>      specifying more than one generator (with a space between each).
>      This allows for seamless switching between short and long links.
>      The first generator will be used for the document ID, and all
>      generated IDs will be present in the 'permalinks' array.
> ---
>   tools/archiver.py   | 35 +++++++++++++++++++++--------------
>   tools/mappings.yaml |  2 +-
>   2 files changed, 22 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/archiver.py b/tools/archiver.py
> index d447c62..6a64775 100755
> --- a/tools/archiver.py
> +++ b/tools/archiver.py
> @@ -400,19 +400,23 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
>   
>           if body is not None or attachments:
>               pmid = mid
> -            try:
> -                mid = plugins.generators.generate(
> -                    self.generator, msg, body, lid, attachments, raw_msg
> -                )
> -            except Exception as err:
> -                if logger:
> -                    # N.B. use .get just in case there is no message-id
> -                    logger.info(
> -                        "Could not generate MID: %s. MSGID: %s",
> -                        err,
> -                        msg_metadata.get("message-id", "?").strip(),
> -                    )
> -                mid = pmid
> +            all_mids = set()  # Use a set to avoid duplicates
> +            for generator in self.generator.split(" "):
> +                if generator:
> +                    try:
> +                        mid = plugins.generators.generate(
> +                            generator, msg, body, lid, attachments, raw_msg
> +                        )
> +                    except Exception as err:
> +                        if logger:
> +                            # N.B. use .get just in case there is no message-id
> +                            logger.info(
> +                                "Could not generate MID: %s. MSGID: %s",
> +                                err,
> +                                msg_metadata.get("message-id", "?").strip(),
> +                            )
> +                        mid = pmid
> +                    all_mids.add(mid)
>   
>               if "in-reply-to" in msg_metadata:
>                   try:
> @@ -425,13 +429,16 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
>                           irt = irt.strip()
>                   except ValueError:
>                       irt = ""
> +            all_mids = list(all_mids)  # Convert back to list
> +            document_id = all_mids[0]
>               output_json = {
>                   "from_raw": msg_metadata["from"],
>                   "from": msg_metadata["from"],
>                   "to": msg_metadata["to"],
>                   "subject": msg_metadata["subject"],
>                   "message-id": msg_metadata["message-id"],
> -                "mid": mid,
> +                "mid": document_id,
> +                "permalinks": all_mids,
>                   "dbid": hashlib.sha3_256(raw_msg).hexdigest(),
>                   "cc": msg_metadata.get("cc"),
>                   "epoch": epoch,
> diff --git a/tools/mappings.yaml b/tools/mappings.yaml
> index 54298d4..6743a07 100644
> --- a/tools/mappings.yaml
> +++ b/tools/mappings.yaml
> @@ -78,7 +78,7 @@ mbox:
>         type: keyword
>       mid:
>         type: keyword
> -    permalink:
> +    permalinks:
>         type: keyword
>       private:
>         type: boolean
>