You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Jerry Malcolm <te...@malcolms.com> on 2019/10/06 19:40:30 UTC

3.4.0 AWS S3 Storage

This is a chain of several offline email exchanges between Matthieu and 
me regarding the S3 blob storage project.  I'm bringing it into this 
forum to include anyone else who might be interested in this topic.   
There are still a few open questions listed below.  So if anyone can 
assist with those, jump on in.

Jerry


Matthieu,

On 10/2/2019 4:33 AM, Matthieu Baechler wrote:

> Hi,
>
> On Tue, 2019-10-01 at 11:41 -0500, Jerry Malcolm wrote:
>
> [...]
>
>>>> Two initial questions:
>>>>
>>>>        1) is this enabled simply by adding blob.properties into
>>>> the
>>>> conf
>>>> directory?
>>> When using cassandra+guice, yes.
>>>
>>> With Spring, everything is much more complex.
>>> You probably need to rework Spring configuration to inject the
>>> right
>>> blob-store implementation.
>> I knew sooner or later I was going to have learn something about
>> Spring....   If you can tell me the cassandra-guice classes that
>> perform the same functions, I'll start looking into making Spring do
>> the
>> same thing.
> Honestly, I don't see the value to learn Spring 3 that has been
> obsolete for years. If you plan to work more frequently on James, I can
> just advise you to learn Guice which is way easier to work with.
Since I am not live yet on my 3.4.x installation, now is the right time 
to move to all of the recommended 'future direction' configurations.  So 
I'll figure out what I need to do to move from Spring to Guice and begin 
with that base on my 3.4.x installation.
>
>>> More important: JPA is currently not able to use a blob-store at
>>> all.
>>>
>>> That means, if you want that feature, that you should:
>>>
>>> * implement blob-api by moving some code out of mailbox-jpa to a
>>> blob-
>>> jpa module
>>> * allow mailbox-jpa to use any blob-api implementation
>> I'm not using JPA at all.  I'm using direct JDBC to MySQL currently.
>> So
>> JPA will be on the back burner initially for me.
> Could you explain to me the exact setup you have right now? It's not
> clear to me what you have and then it's impossible for me to describe
> what you should do starting from there.
>
> [...]

That's a good idea for me to tell you exactly what I have now, and go 
from there:

Big picture... I have had a hosting company since 2002 or 2003 with a 
dedicated server (on Pier1, I think, but they keep changing names).   My 
background at IBM was OS/2 and Windows-based.  So my dedicated server OS 
is Windows Server 2016.  However, I don't use Windows internet servers.  
I'm all open source.  Apache HTTP, Tomcat, MySQL, James, ISC BIND.  In 
July I decided it was time to move from dedicated server to AWS.  
Currently, all of my web services, dns, and web database are fully 
migrated to AWS.  I host a video company with huge web galleries.  So in 
the migration process, I moved the galleries to S3, which gave me the 
opportunity to learn the S3 APIs.  James is my lone holdout on getting 
to AWS and killing my dedicated server.  I've had some struggles.  But 
I'm getting very close now to throwing the switch to AWS for James as well.

I started with James I believe in 02 or 03 with some version of 2.x 
(SAR_INF, etc).  For my entire career, I've always looked for ways to 
make a program do something I needed that it didn't do. James' 
matcher/mailet architecture was a dream come true for me. I did a couple 
of minor version upgrades to James 2.x over the years.  But my main 
concentration was my custom matchers/mailets. Never tried building 
james.  In 2014 I really wanted imap and fast-fail that 3.x had.  So I 
began the migration process.  A few customizations I needed forced some 
base James tweaks.  So I figured out how to build it.  Then after 
getting 3.0b5 up and moving all clients to imap, I began writing imap 
utilities to maintain the imap accounts, such as pruning spam folders, 
archiving mail to archive folders, etc.

My current configuration is AWS EC2 Linux2, james 3.3.something (build 
from the 3.3 branch, but I'll get it to the latest master branch 
shortly).  I use straight JDBC to an AWS RDS MySQL database (tried 
aurora, but had issues... so reverted at least temporarily back to 
mySQL). I've added my mailets.  But I haven't changed any base config.  
So I am assuming I'm using Spring.

Summary: Currently James 3.3.? Spring JDBC MySQL, with a bunch of 
separate IMAP utilities.

After building james, I copied server/app/target/appassembler/lib/* to 
my james lib folder.  I'm assuming there is a different target lib 
folder that I should copy to get the guice version (??).

I see many references to Cassandra-Guice-JPA almost as a single entity 
configuration.  Are they tightly coupled?  Do you enable each 
independently?  Is there a build target folder for all combinations?  Or 
does it not matter?  Should I just go with Cassandra-Guice-JPA as the 
future direction and move on?

>> My only
>> concern with 2 databases and an imapsync utility is how long the
>> migration might take (in large db cases like I have) and having to
>> keep
>> both db's with the absolute latest mail entries.
> Moving a large database between two systems is going to take some time
> anyway.
>
> First, with imapsync, keeping two server in sync is quite easy and
> efficient. You can run it first to move most of the data, then another
> time one day between the server switch and finally run it during the
> switch with a minimal downtime.
>
> [...]
I was not aware of an actual imapsync utility until now.  I thought you 
were just referring to writing a utility that 'synced imap'.  If the 
imapsync utility I found on google is reliable, obviously the best answer.
>> Is there some process I need to go through to get approved as a
>> contributor?  I'm ready.  (Also have a couple of base mailet
>> enhancements I've added that others might be interested in).
> The first steps are to propose to pull requests on github.
>
> People gain contributor status based on their previous contributions
> (it's a kind of meritocracy).
>
> You could start by proposing a pull request for each mailet you have?
Many of my mailets are specific to certain clients.  But I do have a 
couple of fairly minor code changes I'll create pull requests for just 
to get my feet wet.
>
>> Also, I mentioned that I'm still just learning the specifics of GIT.
>> I
>> went to GIT expecting to find a 3.4.x branch.  There's not one
>> (yet).  I
>> thought maybe 'master' would work.  But I got tons of build errors
>> when
>> I cloned master.  So right now I've got the 3.3.x branch and it
>> builds.
>> What branch should I be working with in order to guarantee I'm
>> working
>> with the latest code base?
> master branch. It should be working. We changed the requirement for the
> build phase, you now need a JDK 11 to build it. We soon will require a
> JDK to run, too.
JDK 11 could definitely be a problem.  I was on 9, and moved back to 8 
thinking there were problems with 9.  Obviously moving the wrong 
direction.  I'll install 11 and see if the master branch acts nicer.
>
>
> Finally, it's important we go back to the mailing for our discussion as
> they are of general interest. We can keep the part about your current
> setup private but everything else should be handle publicly.
I subscribed to the james-dev list, got my 'confirm-request' email and 
replied.  But no response as yet.  I'll post this entire chain when I 
get authorized.  (I have no confidentiality issues with my configuration 
/ setup / etc.)
Cheers,