You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Kevin Hunt <Ke...@centropy.com> on 2003/10/17 03:49:32 UTC

Fetchmail features I'm working on, CVS patch help (Long message)

Hello all,

Since I've never posted to the list, a brief introduction... I'm brand 
new to community-style development, but I've been a hobbyist, 
OS-supporter, and semi-professional Java developer for several years.  
"Semi-professional" meaning I write Java applications and servers at 
work, but my main focus is in tech support (escalations).  I'm currently 
rewriting a custom mail-processing application over to James using 
mailets.  However, I need a few additional features, most of which I'm 
comfortable doing.  I would love feedback on these:

1) Cron-scheduling in Steve Brewin's new fetchmail stuff.  This way I 
can schedule fetchmail tasks to only run during business hours, MON-FRI.

My patch adds an optional <cronSchedule> to the <fetchmail> config, and 
schedules the fetchmail scheduler accordingly:
            <!-- A cron-style way to specify when this fetchmail task 
should run -->
            <!-- Value of -1 indicates EVERY, e.g., hour="-1" means 
"EVERY hour" -->
            <!-- Use multiple <cronSchedule>s OR one <interval> for 
scheduling -->
            <!-- See 
org.apache.avalon.cornerstone.services.scheduler.CronTimeTrigger -->
            <cronSchedule minute="-1" hour="-1" month="-1" year="-1"
                day="-1" isDayOfYear="false"/>

2) FetchMail search terms.  With this feature, an optional <fetchfilter> 
may be specified in each fetch task that tells fetchmail to include and 
exclude certain emails.  With this feature, fetchmail only fetches email 
that matches the search terms, and simply leaves other messages alone on 
the server.  <fetchfilter> would effectively obsolete the <fetchall> 
option.  It is documented/configured like this:
            <!-- Filter the emails which are retrieved. -->
            <!-- Messages that do not meet the filter conditions -->
            <!-- are left unmodified on the server and are not -->
            <!-- passed on to the JAMES queue -->
            <!-- Search terms may be grouped with the logical -->
            <!-- operators <and> and <or> -->
            <!-- Terms can be negated using <not> -->
            <!-- Terms used here are adapted from and implemented by -->
            <!-- the javax.mail.search.SearchTerm subclasses -->
            <!-- Available terms: -->
            <!--    <body pattern="regex to find in body"/>              -->
            <!--    <subject pattern="regex to find in subject"/>        -->
            <!--    <flags set="true/false" answered="true" 
deleted="true" draft="true"
                           flagged="true" recent="true" seen="true"
                           user="true"/>    -->
            <!--    <fromstring pattern="regex From header">        -->
            <!--    <from address="email address">        -->
            <!--    <header name="header name" pattern="regex in header 
value">        -->
            <!--    <messageid id="RFC822 message id">        -->
            <!--    <messagenumber number="message number">        -->
            <!--    <receiveddate comparison="GT/GE/LT/LE/EQ/NE" 
date="Simple date">        -->
            <!--    <recipientstring pattern="regex To header">        -->
            <!--    <recipient type="TO/CC/BCC" address="email 
address">        -->
            <!--    <sentdate comparison="GT/GE/LT/LE/EQ/NE" 
date="Simple date">        -->
            <!--    <size comparison="GT/GE/LT/LE/EQ/NE" 
bytes="integer">        -->
            <!-- The example filter below will not fetch mails with 
'Autoreply' -->
            <!-- in the subject nor mail with Autoreply in the body -->
            <!-- Finally, there must be only ONE term that is a child -->
            <!-- of fetchfilter.  Use <and> and <or> for more than one -->
            <!-- condition.  -->
            <fetchfilter>
                <and>
                    <not><subject pattern="Autoreply"/></not>
                    <not><body pattern="Autoreply"/></not>
                </and>
            </fetchfilter>

This sort of SearchTerm stuff is extremely flexible, and this could be 
easily adapted to a Matcher if the mailet spec changes to allow Matcher 
configuration of more than just a String -- which I hope is still being 
considered.

3) "Restart on config change" feature.  When running  in production, my 
app runs in a secured DMZ as an NT service where I don't have access to 
stop/restart it -- but I do have access to the files.  My current 
(non-James) app monitors the config file, and when it notices a change, 
it shuts down and restarts.  I added this to JAMES, but it's ugly (calls 
System.exit(199), which causes my modified run.bat to simply relaunch 
the container.  I know, I know...)

There's an additional complexity because of the NT service wrapper: if 
the JVM is running in the wrapper, a simple call to 
WrapperManager.restart(); takes care of this.

I know there's got to be a better way to do this, independent of the way 
JAMES started up, so I'd like to hear people's ideas.  I'm thinking JMX 
might be able to achieve this, but I don't know much about it.  Plus, it 
would be kinda cool to be able to restart JAMES just by sending a 
command through email.  A 'RestartServer' mailet could handle that 
(after checking credentials of course)...

Lastly, I'm a bit naive when it comes to CVS and creating diffs for 
patches.  What I'm doing now is:
1) Make changes
2) Re-update to the label I'm writing the patch for (either HEAD or 
branch_2_1_fcs)
3) Go around to each directory where I've modified a file and run:
cvs diff -u -w -b -rbranch_2_1_fcs>>c:\patch-xyz.diff

If anyone can let me know if that's the best way to assemble a patch, 
I'll have the above patches out soon.

Oh, and my future messages to the list won't be so darn long ;)

Kevin Hunt





---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


RE: Fetchmail features I'm working on, CVS patch help (Long message)

Posted by Steve Brewin <sb...@synsys.com>.
Kevin,

The Javadoc for StringTerm says...

"This class implements the match method for Strings. The current
implementation provides only for substring matching. We could add
comparisons (like strcmp ...). ".

...So, there is no support for searching using regular expressions on
Addresses, the "text/*" parts of the message body, headers, subjects, etc.
contrary to your expectations. Also, a JavaMail MimeMessage does not support
the received date so you cannot match on it.

Processing in the mailet chain already meet your minimum requirements, as
you can perform SubjectIs and SubjectStartsWith matching with current James
matchers.

Of the other meaningful searches, only sent date and message body search
lack equivalent matchers. Both would be simple to write and generally
useful. By having one matcher per test rather than one matcher supporting
several tests, the current matcher syntax will suffice and you do not need
to worry about "the <and><or> logic".

It seems the real driver for wanting to do this in fetchmail rather than in
the mailet chain is to avoid fetching messages you have reposted to the
message store. But, when you say "I must NOT fetch any emails that were
sent to the mailbox by my application" do you really mean "I must NOT
reprocess"? If so there isn't really an issue with dealing with these in
mailets. If you really mean "NOT fetch", then I assume that you are
expecting...

a) A message sent from James to a local domain to end up back in the
MessageStore. It won't unless you hack the mailet chain to divert
(selected?) local mail to remote delivery.

b) Something else to read the message otherwise there is no point reposting
them. But what? How will he/she/it know to only read messages with certain
subject prefixes?

Why can't the processed messages simply be sent to another mailbox so that
there is no need to filter at all?

In any event, you can't currently repost fetched emails back to the same
message store. fetchmail has loop detection built in which will bounce
messages that have passed through already (it will have the X-fetched-from:
header attached that is used to check for looping messages). You would need
to write a mailet that removes the X-Header. A RemoveHeader mailet would be
generally useful if you do choose to write one.

I'm sure your business process would be best achieved far more flexibly in
the mailet chain. You just need to rework your implementation of what is
required a little.

-- Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: Fetchmail features I'm working on, CVS patch help (Long message)

Posted by Kevin Hunt <Ke...@Centropy.com>.
Steve Brewin wrote:

>Hi Kevin,
>
>Its great that you are interested in contributing. Here's some quick
>feedback which I hope helps. Remember, its not my fetchmail stuff, its the
>community's, and these are just the views of one member of that community.
>  
>
Thanks Steve, yes, of course it's the community's.

>I'm not sure why you wouldn't want to fetch mail outside of business hours,
>but technically this is a good idea. Personally I would tag it so everything
>was kept together like this...
>  
>
<snipped>

Mainly it's a business requirement.  I fetch email sent to our support 
mailboxes, and one of the actions my processing takes is to send a 
response.back to the customer.  The email mentions resolution times, 
which is related to support contracts, which means I can't send emails 
if no one is going to look at the customer's request immediately.

Anyway, your points make sense and I'll take them into account.

>>2) FetchMail search terms.
>>    
>>
>
><snipped>
>  
>
>><fetchfilter> would effectively obsolete the <fetchall> option.
>>    
>>
>Not without provision for changing the mail state to delete of seen,
>otherwise there is no way to avoid processesing the same messages over and
>over.
>
>  
>
I realized my wording was incorrect.  It wouldn't obsolete <fetchall>, 
but this would alter the behavior of fetchall because the filter would 
change the subset of messages that is later subject to the <fetchall> 
setting.

>In most use-cases, wouldn't "Simple date" need to be dynamic or expressed as
>an offset such as 30 days ago?
>
>  
>
Ah yes, the power of someone else looking at yuor work--I went crazily 
overboard with my fetchfilters.  I adapted every single subclass of 
JavaMail SearchTerm without thinking if they'd be useful.  Some of these 
are useless.

>Personally, I feel that generalised mail processing of the type you propose
>should be left to James' mailet chain. Otherwise, where do you stop? Before
>we know it, someone will realise that however many filters we add, there is
>always a new requirement and propose that we add a pluggable message
>processing structure into fetchmail, ie: mailets! As we already have this in
>James we don't need it elsewhere.
>
>  
>
I did realize that at one point, and I was thinking of providing 
FetchMail a list of Matchers to allow customization to that detail.  But 
beyond the fact that that was a bad idea on several levels ;), in the 
end, my only requirement is that I must NOT fetch any emails that were 
sent to the mailbox by my application ( it sends an email back to 
"itself" which specifies the status and result of the customer's 
email).  These emails are identified with a known subject.prefix.  And 
so a simple subject regex in fetchmail would suffice.  The other fields 
and the <and><or> logic was fairly easy to implement, so I did it., in 
case someone needs to filter by a headert.  But as mentioned above, some 
are superfluous.

<snipped>

>In fact, next up in my to do list is to switch to
>using an InputStream when the message delivered into James is created so
>that we don't EVER touch the message body in fetchmail. This will be
>defeated if we start checking the contents or size of the message body.
>
>  
>
The filter would only be applied if present.  If not, there's no hit to 
performance and none of the msg body is fetched unnecessarily.  When 
applied, it's up to the JavaMail Folder.search() implementation to find 
matching messages.  For example, if were are using an IMAP server and 
filtering on subject, only the message envelopes would be retrieved from 
the server.

>As you have probably gathered, I'm personally not taken by the idea of
>adding this to fetchmail. It would make a useful matcher though.
>
>  
>
Hopefully I've explained why this can't be a matcher... but it could be 
scaled down to a smaller set of search terms... namely: subject, header, 
recipientstring, recipient, from, fromstring.  Or just a subjet filter 
if the other filters would have no use in the 'real' world.

>It is already possible to pass search terms such as this to a matcher, if
>somewhat cumbersome. The scripting support added by ScriptedMatcher does
>this. Unfortunately, it hasn't made its way into head yet, but I would be
>happy to send you the code to examine.
>
>  
>
Hadn't heard about this yet... I'll take a look if you have it handy.

<snipped>

>>Lastly, I'm a bit naive when it comes to CVS and creating diffs for
>>patches.
>>    
>>
>
><snipped>
>
>I develop in Eclipse and use its Patch creation facility. This works well
>except for a bug when including new files in new directories in the patch
>(so don't). Be sure to configure the Eclipse editors to use spaces instead
>of tabs and Unix line delimiters. You might want to post this, and the
>restart question seperately so you get the views of others who have no
>interest in fetchmail.
>
>  
>
I use Intellij Idea Aurora, which doesn't appear to have that ability.  
Looks like I'll stick to command line for now...

>I hope this helps.
>
>Cheers,
>
>-- Steve
>
>  
>
Immensely.  Thanks for your time!
Kevin



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


RE: Fetchmail features I'm working on, CVS patch help (Long message)

Posted by Steve Brewin <sb...@synsys.com>.
Kevin Hunt wrote:

Hi Kevin,

Its great that you are interested in contributing. Here's some quick
feedback which I hope helps. Remember, its not my fetchmail stuff, its the
community's, and these are just the views of one member of that community.

> 1) Cron-scheduling in Steve Brewin's new fetchmail stuff.  This way I
> can schedule fetchmail tasks to only run during business
> hours, MON-FRI.

<snipped>

I'm not sure why you wouldn't want to fetch mail outside of business hours,
but technically this is a good idea. Personally I would tag it so everything
was kept together like this...
<schedule>
    <cron minute="13" hour="-1" month="-1" year="-1" day="-1"
isDayOfYear="false"/>
    <cron minute="28" hour="-1" month="-1" year="-1" day="-1"
isDayOfYear="false"/>
    <cron minute="43" hour="-1" month="-1" year="-1" day="-1"
isDayOfYear="false"/>
    <cron minute="58" hour="-1" month="-1" year="-1" day="-1"
isDayOfYear="false"/>
</schedule>
...and while I can't see why you would want to, there is no technical reason
why <schedule> and <interval> need to be mutually exclusive. Only one fetch
task  will run at a time, if a fetch is triggered while one is running its
ignored.

> 2) FetchMail search terms.

<snipped>

> <fetchfilter> would effectively obsolete the <fetchall> option.
Not without provision for changing the mail state to delete of seen,
otherwise there is no way to avoid processesing the same messages over and
over.

<snipped>

>            <!--    <receiveddate comparison="GT/GE/LT/LE/EQ/NE"
date="Simple date">        -->
>            <!--    <sentdate comparison="GT/GE/LT/LE/EQ/NE" date="Simple
date">        -->

In most use-cases, wouldn't "Simple date" need to be dynamic or expressed as
an offset such as 30 days ago?

<snipped>

>             <!--    <messageid id="RFC822 message id">        -->
>             <!--    <messagenumber number="message number">        -->

How do you know the messageid and number beforehand to match on it?

<snipped>

My rational for including some simple filters to fetchmail were:

1) To avoid injecting mail into James that had been willfully (or
accidentally) sent to the ISP hosting the mail server in the hope of it
getting picked up by a mail client or relayed.

2) To handle situations unique to messages fetched from a message store,
such as being unable to identify the intended recipient.

3) To avoid the cost of fetching the message bodies for mail that is going
to be rejected.

Personally, I feel that generalised mail processing of the type you propose
should be left to James' mailet chain. Otherwise, where do you stop? Before
we know it, someone will realise that however many filters we add, there is
always a new requirement and propose that we add a pluggable message
processing structure into fetchmail, ie: mailets! As we already have this in
James we don't need it elsewhere.

Many of the criteria you specify would equally apply to mail injected from
all sources, not just fetchmail, so its best to define it just once, in
James' mailet chain.

As touched on in (3) above, fetchmail is written to avoid the overhead of
touching the message body. In fact, next up in my to do list is to switch to
using an InputStream when the message delivered into James is created so
that we don't EVER touch the message body in fetchmail. This will be
defeated if we start checking the contents or size of the message body.

As you have probably gathered, I'm personally not taken by the idea of
adding this to fetchmail. It would make a useful matcher though.

> This sort of SearchTerm stuff is extremely flexible, and this
> could be
> easily adapted to a Matcher if the mailet spec changes to
> allow Matcher
> configuration of more than just a String -- which I hope is
> still being
> considered.

It is already possible to pass search terms such as this to a matcher, if
somewhat cumbersome. The scripting support added by ScriptedMatcher does
this. Unfortunately, it hasn't made its way into head yet, but I would be
happy to send you the code to examine.

> 3) "Restart on config change" feature.

<snipped>

If JMX is enabled, you can use a JMX console to restart James. You should
also be able to progammatically ask Phoenix to restart James directly or via
JMX, but I haven't done so.

> Lastly, I'm a bit naive when it comes to CVS and creating diffs for
> patches.

<snipped>

I develop in Eclipse and use its Patch creation facility. This works well
except for a bug when including new files in new directories in the patch
(so don't). Be sure to configure the Eclipse editors to use spaces instead
of tabs and Unix line delimiters. You might want to post this, and the
restart question seperately so you get the views of others who have no
interest in fetchmail.

I hope this helps.

Cheers,

-- Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org