You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ponymail.apache.org by Daniel Gruno <hu...@apache.org> on 2020/08/10 17:51:04 UTC

Intent to set up a "ponymail 2" repository

Hi folks,
with the incoming new UI and with ElasticSearch moving far away from 
what Pony Mail supports currently, I think it's also time to get started 
on a "next generation" of Pony Mail, that supports the new structures 
laid out in ES 7.x and above, and while we're at it....I think we should 
ditch Lua and use a pure python implementation instead, to increase the 
number of potential contributors (and make use of Python's excellent 
libraries).

SO, with all that said, I'll be setting up a new repository for this 
'next generation' of Pony Mail, so as to not start mixing the old and 
the new too much. I think this is justified, as it's a very major shift 
from the old code-base, and the two would be internally incompatible 
(except for the JSON API, I believe that should remain as is.)

My plan is to:

- create a new repository
- import the UIX code-base into this
- import the tools we have from pony v1, tweak those later on
- get started on a pure python back-end for the UI (I have some 
semblance of a prototype that I'm hacking on currently)
- finally, write a migration script for migrating from the old DBs to 
the new format.

Comments/feedback is always welcome as usual :)

With regards,
Daniel.

Re: Intent to set up a "ponymail 2" repository

Posted by sebb <se...@gmail.com>.

On Tue, 11 Aug 2020 at 10:27, Daniel Gruno <hu...@apache.org> wrote:
>
> On 11/08/2020 01.05, sebb wrote:
> > [snip]
> > *Any* changes need to be tested, especially since bugs here can have
> > permanent consequences.
> >
>
> I am sort of stumped with regards to what needs to be tested here, and
> what concerns you have about adopting the ES7.x standard layout.
>
> I think we can all agree that more testing is better, but we need some
> semblance of an idea of what/how to test, or we'll be at this impasse
> for a long while :)
>
> The change I am proposing (upgrading the archiver to support ES6/7) does
> not alter the document IDs, it does not change the sources, and it's not
> meant to be used with the old system at all, as that wouldn't work with
> ES7 anyway. It's a very literal "instead of using database named A, we
> use database named B" operation.

Yes, I understand the change you are proposing.

However this requires code changes, and code changes need to be tested.

If there is to be a hope of cleanly migrating a database from the old
to the new, there needs to be consistency of Permalinks between the
old and new software.

Suppose the new software sometimes creates a different Permalink from
the old software.
It would no longer be possible to do a parallel run without
introducing discrepancies.
Likewise, if the old and new software behaves differently wrt
Permalinks when importing existing mailboxes, there would be a
problem.

The other aspect that needs to be thoroughly tested is privacy.
There have been several instances of inadvertent leaks in the past.
Any change in the code needs to be analysed and tested for such issues.

>

Re: Intent to set up a "ponymail 2" repository

Posted by Daniel Gruno <hu...@apache.org>.

On 11/08/2020 01.05, sebb wrote:
> [snip]
> *Any* changes need to be tested, especially since bugs here can have
> permanent consequences.
> 

I am sort of stumped with regards to what needs to be tested here, and 
what concerns you have about adopting the ES7.x standard layout.

I think we can all agree that more testing is better, but we need some 
semblance of an idea of what/how to test, or we'll be at this impasse 
for a long while :)

The change I am proposing (upgrading the archiver to support ES6/7) does 
not alter the document IDs, it does not change the sources, and it's not 
meant to be used with the old system at all, as that wouldn't work with 
ES7 anyway. It's a very literal "instead of using database named A, we 
use database named B" operation.

Re: Intent to set up a "ponymail 2" repository

Posted by sebb <se...@gmail.com>.

On Mon, 10 Aug 2020 at 22:30, Daniel Gruno <hu...@apache.org> wrote:
>
> On 10/08/2020 23.23, sebb wrote:
> > On Mon, 10 Aug 2020 at 22:10, Daniel Gruno <hu...@apache.org> wrote:
> >>
> >> On 10/08/2020 22.56, sebb wrote:
> >>> On Mon, 10 Aug 2020 at 21:49, Daniel Gruno <hu...@apache.org> wrote:
> >>>>
> >>>> On 10/08/2020 22.46, sebb wrote:
> >>>>> On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
> >>>>>>
> >>>>>> Hi folks,
> >>>>>> with the incoming new UI and with ElasticSearch moving far away from
> >>>>>> what Pony Mail supports currently, I think it's also time to get started
> >>>>>> on a "next generation" of Pony Mail, that supports the new structures
> >>>>>> laid out in ES 7.x and above, and while we're at it....I think we should
> >>>>>> ditch Lua and use a pure python implementation instead, to increase the
> >>>>>> number of potential contributors (and make use of Python's excellent
> >>>>>> libraries).
> >>>>>>
> >>>>>> SO, with all that said, I'll be setting up a new repository for this
> >>>>>> 'next generation' of Pony Mail, so as to not start mixing the old and
> >>>>>> the new too much. I think this is justified, as it's a very major shift
> >>>>>> from the old code-base, and the two would be internally incompatible
> >>>>>> (except for the JSON API, I believe that should remain as is.)
> >>>>>>
> >>>>>> My plan is to:
> >>>>>>
> >>>>>> - create a new repository
> >>>>>> - import the UIX code-base into this
> >>>>>> - import the tools we have from pony v1, tweak those later on
> >>>>>> - get started on a pure python back-end for the UI (I have some
> >>>>>> semblance of a prototype that I'm hacking on currently)
> >>>>>> - finally, write a migration script for migrating from the old DBs to
> >>>>>> the new format.
> >>>>>
> >>>>> Before one can even think about migrating a database, there needs to
> >>>>> be a full test suite.
> >>>>> In particular, there need to be exhaustive tests to show that the same
> >>>>> Permalinks will be generated.
> >>>>
> >>>> I think perhaps you misunderstand me here :)
> >>>> The database *contents* would be migrated verbatim, just with the new ES
> >>>> structure, where each doctype in the old setup is its own index with
> >>>> doctype _doc. The IDs stay the same as they were before.
> >>>
> >>> I realise that.
> >>>
> >>> However, I don't think this can be done without changing the backend
> >>> code which is responsible for generating the IDs going forward.
> >>> There have already been unplanned changes to the generators which
> >>> affect the output.
> >>> Such changes need to be caught before they cause compatibility issues.
> >>
> >> There have not been changes to what's allowed as a document ID in ES.
> >>
> >> 'generating the IDs' does not play into this at all, I'm not talking
> >> about re-importing, but copying the database content including document
> >> IDs verbatim, just to a new "directory structure". If something has ID
> >> 1234foo in the old database, it would still be 1234foo in the new one,
> >> but it would be present in the index ponymail-mbox instead of ponymail.
> >
> > I think you are missing the point.
> >
> > The move to a different database structure will necessarily involve
> > changes to the backend Python code.
> > It is vital any changes don't affect the Permalinks, so there needs to
> > be a proper test suite.
>
> Are you talking about the archiver.py code? Or the not-yet-done idea of
> moving to 100% python for the UI?

I am referring mainly to the Python archiver code.

> For the archiver code, it's a very simple change: instead of
> .index(index='ponymail', doctype='mbox', id=generated_id, ...)
> you have:
> .index(index='ponymail-mbox', id=generated_id)
>
> Changing the archiver should in no way affect permalinks, and certainly

*Any* changes need to be tested, especially since bugs here can have
permanent consequences.

> not the old permalinks, as they would have been copied verbatim.

> For the idea of 100% python, it's different as we'd be ditching the Lua
> backend entirely - I'd be happy to look at compatibility and unit
> testing for that, though it's not a simple ask.

It would be good to ensure that no functionality is lost.
However bugs in the UI have no long term consequences.

> >
> >>>
> >>>> thus, the ponymail database (or whatever you have called it) would get
> >>>> split into several databases instead:
> >>>> - ponymail-mbox
> >>>> - ponymail-attachments
> >>>> - ponymail-sources
> >>>> - ponymail-sessions
> >>>> etc etc
> >>>>
> >>>> With regards,
> >>>> Daniel.
> >>>>
> >>>>>
> >>>>>> Comments/feedback is always welcome as usual :)
> >>>>>>
> >>>>>> With regards,
> >>>>>> Daniel.
> >>>>
> >>
>

Re: Intent to set up a "ponymail 2" repository

Posted by Daniel Gruno <hu...@apache.org>.

On 10/08/2020 23.23, sebb wrote:
> On Mon, 10 Aug 2020 at 22:10, Daniel Gruno <hu...@apache.org> wrote:
>>
>> On 10/08/2020 22.56, sebb wrote:
>>> On Mon, 10 Aug 2020 at 21:49, Daniel Gruno <hu...@apache.org> wrote:
>>>>
>>>> On 10/08/2020 22.46, sebb wrote:
>>>>> On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
>>>>>>
>>>>>> Hi folks,
>>>>>> with the incoming new UI and with ElasticSearch moving far away from
>>>>>> what Pony Mail supports currently, I think it's also time to get started
>>>>>> on a "next generation" of Pony Mail, that supports the new structures
>>>>>> laid out in ES 7.x and above, and while we're at it....I think we should
>>>>>> ditch Lua and use a pure python implementation instead, to increase the
>>>>>> number of potential contributors (and make use of Python's excellent
>>>>>> libraries).
>>>>>>
>>>>>> SO, with all that said, I'll be setting up a new repository for this
>>>>>> 'next generation' of Pony Mail, so as to not start mixing the old and
>>>>>> the new too much. I think this is justified, as it's a very major shift
>>>>>> from the old code-base, and the two would be internally incompatible
>>>>>> (except for the JSON API, I believe that should remain as is.)
>>>>>>
>>>>>> My plan is to:
>>>>>>
>>>>>> - create a new repository
>>>>>> - import the UIX code-base into this
>>>>>> - import the tools we have from pony v1, tweak those later on
>>>>>> - get started on a pure python back-end for the UI (I have some
>>>>>> semblance of a prototype that I'm hacking on currently)
>>>>>> - finally, write a migration script for migrating from the old DBs to
>>>>>> the new format.
>>>>>
>>>>> Before one can even think about migrating a database, there needs to
>>>>> be a full test suite.
>>>>> In particular, there need to be exhaustive tests to show that the same
>>>>> Permalinks will be generated.
>>>>
>>>> I think perhaps you misunderstand me here :)
>>>> The database *contents* would be migrated verbatim, just with the new ES
>>>> structure, where each doctype in the old setup is its own index with
>>>> doctype _doc. The IDs stay the same as they were before.
>>>
>>> I realise that.
>>>
>>> However, I don't think this can be done without changing the backend
>>> code which is responsible for generating the IDs going forward.
>>> There have already been unplanned changes to the generators which
>>> affect the output.
>>> Such changes need to be caught before they cause compatibility issues.
>>
>> There have not been changes to what's allowed as a document ID in ES.
>>
>> 'generating the IDs' does not play into this at all, I'm not talking
>> about re-importing, but copying the database content including document
>> IDs verbatim, just to a new "directory structure". If something has ID
>> 1234foo in the old database, it would still be 1234foo in the new one,
>> but it would be present in the index ponymail-mbox instead of ponymail.
> 
> I think you are missing the point.
> 
> The move to a different database structure will necessarily involve
> changes to the backend Python code.
> It is vital any changes don't affect the Permalinks, so there needs to
> be a proper test suite.

Are you talking about the archiver.py code? Or the not-yet-done idea of 
moving to 100% python for the UI?

For the archiver code, it's a very simple change: instead of
.index(index='ponymail', doctype='mbox', id=generated_id, ...)
you have:
.index(index='ponymail-mbox', id=generated_id)

Changing the archiver should in no way affect permalinks, and certainly 
not the old permalinks, as they would have been copied verbatim.

For the idea of 100% python, it's different as we'd be ditching the Lua 
backend entirely - I'd be happy to look at compatibility and unit 
testing for that, though it's not a simple ask.

> 
>>>
>>>> thus, the ponymail database (or whatever you have called it) would get
>>>> split into several databases instead:
>>>> - ponymail-mbox
>>>> - ponymail-attachments
>>>> - ponymail-sources
>>>> - ponymail-sessions
>>>> etc etc
>>>>
>>>> With regards,
>>>> Daniel.
>>>>
>>>>>
>>>>>> Comments/feedback is always welcome as usual :)
>>>>>>
>>>>>> With regards,
>>>>>> Daniel.
>>>>
>>

Re: Intent to set up a "ponymail 2" repository

Posted by sebb <se...@gmail.com>.

On Mon, 10 Aug 2020 at 22:10, Daniel Gruno <hu...@apache.org> wrote:
>
> On 10/08/2020 22.56, sebb wrote:
> > On Mon, 10 Aug 2020 at 21:49, Daniel Gruno <hu...@apache.org> wrote:
> >>
> >> On 10/08/2020 22.46, sebb wrote:
> >>> On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
> >>>>
> >>>> Hi folks,
> >>>> with the incoming new UI and with ElasticSearch moving far away from
> >>>> what Pony Mail supports currently, I think it's also time to get started
> >>>> on a "next generation" of Pony Mail, that supports the new structures
> >>>> laid out in ES 7.x and above, and while we're at it....I think we should
> >>>> ditch Lua and use a pure python implementation instead, to increase the
> >>>> number of potential contributors (and make use of Python's excellent
> >>>> libraries).
> >>>>
> >>>> SO, with all that said, I'll be setting up a new repository for this
> >>>> 'next generation' of Pony Mail, so as to not start mixing the old and
> >>>> the new too much. I think this is justified, as it's a very major shift
> >>>> from the old code-base, and the two would be internally incompatible
> >>>> (except for the JSON API, I believe that should remain as is.)
> >>>>
> >>>> My plan is to:
> >>>>
> >>>> - create a new repository
> >>>> - import the UIX code-base into this
> >>>> - import the tools we have from pony v1, tweak those later on
> >>>> - get started on a pure python back-end for the UI (I have some
> >>>> semblance of a prototype that I'm hacking on currently)
> >>>> - finally, write a migration script for migrating from the old DBs to
> >>>> the new format.
> >>>
> >>> Before one can even think about migrating a database, there needs to
> >>> be a full test suite.
> >>> In particular, there need to be exhaustive tests to show that the same
> >>> Permalinks will be generated.
> >>
> >> I think perhaps you misunderstand me here :)
> >> The database *contents* would be migrated verbatim, just with the new ES
> >> structure, where each doctype in the old setup is its own index with
> >> doctype _doc. The IDs stay the same as they were before.
> >
> > I realise that.
> >
> > However, I don't think this can be done without changing the backend
> > code which is responsible for generating the IDs going forward.
> > There have already been unplanned changes to the generators which
> > affect the output.
> > Such changes need to be caught before they cause compatibility issues.
>
> There have not been changes to what's allowed as a document ID in ES.
>
> 'generating the IDs' does not play into this at all, I'm not talking
> about re-importing, but copying the database content including document
> IDs verbatim, just to a new "directory structure". If something has ID
> 1234foo in the old database, it would still be 1234foo in the new one,
> but it would be present in the index ponymail-mbox instead of ponymail.

I think you are missing the point.

The move to a different database structure will necessarily involve
changes to the backend Python code.
It is vital any changes don't affect the Permalinks, so there needs to
be a proper test suite.

> >
> >> thus, the ponymail database (or whatever you have called it) would get
> >> split into several databases instead:
> >> - ponymail-mbox
> >> - ponymail-attachments
> >> - ponymail-sources
> >> - ponymail-sessions
> >> etc etc
> >>
> >> With regards,
> >> Daniel.
> >>
> >>>
> >>>> Comments/feedback is always welcome as usual :)
> >>>>
> >>>> With regards,
> >>>> Daniel.
> >>
>

Re: Intent to set up a "ponymail 2" repository

Posted by Daniel Gruno <hu...@apache.org>.

On 10/08/2020 22.56, sebb wrote:
> On Mon, 10 Aug 2020 at 21:49, Daniel Gruno <hu...@apache.org> wrote:
>>
>> On 10/08/2020 22.46, sebb wrote:
>>> On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
>>>>
>>>> Hi folks,
>>>> with the incoming new UI and with ElasticSearch moving far away from
>>>> what Pony Mail supports currently, I think it's also time to get started
>>>> on a "next generation" of Pony Mail, that supports the new structures
>>>> laid out in ES 7.x and above, and while we're at it....I think we should
>>>> ditch Lua and use a pure python implementation instead, to increase the
>>>> number of potential contributors (and make use of Python's excellent
>>>> libraries).
>>>>
>>>> SO, with all that said, I'll be setting up a new repository for this
>>>> 'next generation' of Pony Mail, so as to not start mixing the old and
>>>> the new too much. I think this is justified, as it's a very major shift
>>>> from the old code-base, and the two would be internally incompatible
>>>> (except for the JSON API, I believe that should remain as is.)
>>>>
>>>> My plan is to:
>>>>
>>>> - create a new repository
>>>> - import the UIX code-base into this
>>>> - import the tools we have from pony v1, tweak those later on
>>>> - get started on a pure python back-end for the UI (I have some
>>>> semblance of a prototype that I'm hacking on currently)
>>>> - finally, write a migration script for migrating from the old DBs to
>>>> the new format.
>>>
>>> Before one can even think about migrating a database, there needs to
>>> be a full test suite.
>>> In particular, there need to be exhaustive tests to show that the same
>>> Permalinks will be generated.
>>
>> I think perhaps you misunderstand me here :)
>> The database *contents* would be migrated verbatim, just with the new ES
>> structure, where each doctype in the old setup is its own index with
>> doctype _doc. The IDs stay the same as they were before.
> 
> I realise that.
> 
> However, I don't think this can be done without changing the backend
> code which is responsible for generating the IDs going forward.
> There have already been unplanned changes to the generators which
> affect the output.
> Such changes need to be caught before they cause compatibility issues.

There have not been changes to what's allowed as a document ID in ES.

'generating the IDs' does not play into this at all, I'm not talking 
about re-importing, but copying the database content including document 
IDs verbatim, just to a new "directory structure". If something has ID 
1234foo in the old database, it would still be 1234foo in the new one, 
but it would be present in the index ponymail-mbox instead of ponymail.

> 
>> thus, the ponymail database (or whatever you have called it) would get
>> split into several databases instead:
>> - ponymail-mbox
>> - ponymail-attachments
>> - ponymail-sources
>> - ponymail-sessions
>> etc etc
>>
>> With regards,
>> Daniel.
>>
>>>
>>>> Comments/feedback is always welcome as usual :)
>>>>
>>>> With regards,
>>>> Daniel.
>>

Re: Intent to set up a "ponymail 2" repository

Posted by sebb <se...@gmail.com>.

On Mon, 10 Aug 2020 at 21:49, Daniel Gruno <hu...@apache.org> wrote:
>
> On 10/08/2020 22.46, sebb wrote:
> > On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
> >>
> >> Hi folks,
> >> with the incoming new UI and with ElasticSearch moving far away from
> >> what Pony Mail supports currently, I think it's also time to get started
> >> on a "next generation" of Pony Mail, that supports the new structures
> >> laid out in ES 7.x and above, and while we're at it....I think we should
> >> ditch Lua and use a pure python implementation instead, to increase the
> >> number of potential contributors (and make use of Python's excellent
> >> libraries).
> >>
> >> SO, with all that said, I'll be setting up a new repository for this
> >> 'next generation' of Pony Mail, so as to not start mixing the old and
> >> the new too much. I think this is justified, as it's a very major shift
> >> from the old code-base, and the two would be internally incompatible
> >> (except for the JSON API, I believe that should remain as is.)
> >>
> >> My plan is to:
> >>
> >> - create a new repository
> >> - import the UIX code-base into this
> >> - import the tools we have from pony v1, tweak those later on
> >> - get started on a pure python back-end for the UI (I have some
> >> semblance of a prototype that I'm hacking on currently)
> >> - finally, write a migration script for migrating from the old DBs to
> >> the new format.
> >
> > Before one can even think about migrating a database, there needs to
> > be a full test suite.
> > In particular, there need to be exhaustive tests to show that the same
> > Permalinks will be generated.
>
> I think perhaps you misunderstand me here :)
> The database *contents* would be migrated verbatim, just with the new ES
> structure, where each doctype in the old setup is its own index with
> doctype _doc. The IDs stay the same as they were before.

I realise that.

However, I don't think this can be done without changing the backend
code which is responsible for generating the IDs going forward.
There have already been unplanned changes to the generators which
affect the output.
Such changes need to be caught before they cause compatibility issues.

> thus, the ponymail database (or whatever you have called it) would get
> split into several databases instead:
> - ponymail-mbox
> - ponymail-attachments
> - ponymail-sources
> - ponymail-sessions
> etc etc
>
> With regards,
> Daniel.
>
> >
> >> Comments/feedback is always welcome as usual :)
> >>
> >> With regards,
> >> Daniel.
>

Re: Intent to set up a "ponymail 2" repository

Posted by Daniel Gruno <hu...@apache.org>.

On 10/08/2020 22.46, sebb wrote:
> On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
>>
>> Hi folks,
>> with the incoming new UI and with ElasticSearch moving far away from
>> what Pony Mail supports currently, I think it's also time to get started
>> on a "next generation" of Pony Mail, that supports the new structures
>> laid out in ES 7.x and above, and while we're at it....I think we should
>> ditch Lua and use a pure python implementation instead, to increase the
>> number of potential contributors (and make use of Python's excellent
>> libraries).
>>
>> SO, with all that said, I'll be setting up a new repository for this
>> 'next generation' of Pony Mail, so as to not start mixing the old and
>> the new too much. I think this is justified, as it's a very major shift
>> from the old code-base, and the two would be internally incompatible
>> (except for the JSON API, I believe that should remain as is.)
>>
>> My plan is to:
>>
>> - create a new repository
>> - import the UIX code-base into this
>> - import the tools we have from pony v1, tweak those later on
>> - get started on a pure python back-end for the UI (I have some
>> semblance of a prototype that I'm hacking on currently)
>> - finally, write a migration script for migrating from the old DBs to
>> the new format.
> 
> Before one can even think about migrating a database, there needs to
> be a full test suite.
> In particular, there need to be exhaustive tests to show that the same
> Permalinks will be generated.

I think perhaps you misunderstand me here :)
The database *contents* would be migrated verbatim, just with the new ES 
structure, where each doctype in the old setup is its own index with 
doctype _doc. The IDs stay the same as they were before.

thus, the ponymail database (or whatever you have called it) would get 
split into several databases instead:
- ponymail-mbox
- ponymail-attachments
- ponymail-sources
- ponymail-sessions
etc etc

With regards,
Daniel.

> 
>> Comments/feedback is always welcome as usual :)
>>
>> With regards,
>> Daniel.

Re: Intent to set up a "ponymail 2" repository

Posted by sebb <se...@gmail.com>.

On Mon, 10 Aug 2020 at 18:51, Daniel Gruno <hu...@apache.org> wrote:
>
> Hi folks,
> with the incoming new UI and with ElasticSearch moving far away from
> what Pony Mail supports currently, I think it's also time to get started
> on a "next generation" of Pony Mail, that supports the new structures
> laid out in ES 7.x and above, and while we're at it....I think we should
> ditch Lua and use a pure python implementation instead, to increase the
> number of potential contributors (and make use of Python's excellent
> libraries).
>
> SO, with all that said, I'll be setting up a new repository for this
> 'next generation' of Pony Mail, so as to not start mixing the old and
> the new too much. I think this is justified, as it's a very major shift
> from the old code-base, and the two would be internally incompatible
> (except for the JSON API, I believe that should remain as is.)
>
> My plan is to:
>
> - create a new repository
> - import the UIX code-base into this
> - import the tools we have from pony v1, tweak those later on
> - get started on a pure python back-end for the UI (I have some
> semblance of a prototype that I'm hacking on currently)
> - finally, write a migration script for migrating from the old DBs to
> the new format.

Before one can even think about migrating a database, there needs to
be a full test suite.
In particular, there need to be exhaustive tests to show that the same
Permalinks will be generated.

> Comments/feedback is always welcome as usual :)
>
> With regards,
> Daniel.