You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Hontvari Jozsef <ho...@solware.com> on 2003/05/18 20:09:02 UTC

[PATCH] GenericListserv.patch

This is a one line patch, it fixes the GenericListserv autoBracket feature,
now it generates "[xxx] " instead of "[xxx]" (notice the space), which was
the original intent, I think.

Unfortunately it still has another problem with non-ascii characters in the
subject. It uses the server's encoding which is either good or bad. But
there is no easy way to solve this I think, although a fixed UTF-8 would be
good for any non-ascii subject, but some email clients still don't
understand UTF-8.


 GenericListserv.patch

Re: [PATCH] subject normalization in GenericListserv

Posted by Hontvari Jozsef <ho...@solware.com>.
enhancement ideas (just for the record):
-better heuristic: if it cannot determine the original charset from the raw
header, it can also look for the body charset if it is available.
-(clarification) if the _decoded_ header value is ASCII only then the
charset is definitely ASCII. However this doesn't mean that the new value
will only contain ASCII chars (the prefix can contain anything). So some
charset heuristic may be still useful.
-under java 1.4 java.nio the system can determine if the original charset
fits or not (if there is suntranslatable characters). If the charset doesn't
fit the new value, it can fall back to UTF-8.


----- Original Message -----
From: "Hontvari Jozsef" <ho...@solware.com>
To: "James Developers List" <ja...@jakarta.apache.org>
Sent: Monday, May 19, 2003 12:53 AM
Subject: Re: [PATCH] subject normalization in GenericListserv


> this new patch (attached) also includes my previous one line patch (which
> adds the space after the bracket).
>
> This patch attempts to prevent losing the charset info during the
> normalization of the email subject. I think until all email client
supports
> UTF-8, it is impossible to do this correctly, but this patch should work
> almost always in the practice. It takes the raw subject and attempts to
> determine the original charset in which the sender encoded the header. If
it
> is successful then it uses the same charset when it encodes the amended
> subject. If it cannot determine the original charset then the system uses
> the server's default charset. In rare cases, when the text is non-ascii,
but
> the server JRE doesn't support the charset,  it falls back to UTF-8.
>
> However, corresponding to the RFCs, James still never encodes an ASCII
only
> text.
>
>
> ----- Original Message -----
> From: "Hontvari Jozsef" <ho...@solware.com>
> To: "James Developers List" <ja...@jakarta.apache.org>
> Sent: Sunday, May 18, 2003 10:52 PM
> Subject: Re: [PATCH] GenericListserv.patch
>
>
> > Hello Noel,
> >
> > you are right in that subject, but my path was related to a different
> > feature, i.e. the autoBracket switch. The latter feature adds a bracket
> pair
> > the the supplied prefix (and with the patch it adds a space also). I am
> > almost sure that this was the intent, because if you read the
> > AvalonListserver javadoc, it is explicitly mentioned there. Indeed, if
you
> > dont apply the fix, then there is no way to add the space after the
> bracket
> > (if you are using autoBracket of course) :-). And of course I think the
> > spaces is added everywhere. On the other hand you can add any prefix
(with
> > or without space) if you switch off auto-bracket, so nothing is lost (in
> > contrast to the source in the diff you linked in).
> >
> > By the way, don't apply the patch because I have a better, which
attempts
> to
> > solve the second problem.
> >
> > Regarding the non-latin characters I meant that the incoming mail is
> > correctly encoded, escaped, etc., but the subject transformation loses
the
> > original charset and applies the server's default charset, which is not
> > necessarily a good idea. E.g. an email with a Chinese subject will not
> look
> > too good after encoded with, let's say, latin-1, if that is the default
on
> > the server.
> > I mean if you call s=getSubject() then setSubject(s) then you will do
lose
> > information. It is awful, but true.
> >
> >
> > ----- Original Message -----
> > From: "Noel J. Bergman" <no...@devtech.com>
> > To: "James Developers List" <ja...@jakarta.apache.org>
> > Sent: Sunday, May 18, 2003 10:26 PM
> > Subject: RE: [PATCH] GenericListserv.patch
> >
> >
> > > > This is a one line patch, it fixes the GenericListserv autoBracket
> > > feature,
> > > > now it generates "[xxx] " instead of "[xxx]" (notice the space),
which
> > was
> > > > the original intent, I think.
> > >
> > > Thank you for the patch, but this patch is rejected.  We have, in
fact,
> > been
> > > over this territory before, as you can see from the CVS:
> > >
> > >
> > >
> >
>
http://cvs.apache.org/viewcvs/jakarta-james/src/java/org/apache/james/transp
> > > ort/mailets/GenericListserv.java.diff?r1=1.10&r2=1.11&diff_format=h
> > >
> > > The reason for rejecting the patch is that it is possible to include
the
> > > space character in the configuration, but not to remove one.  You
simply
> > > "xml:space="preserve" when defining the text.  Were we to accept this
> > patch,
> > > there would be no way to NOT have that space.
> > >
> > > > Unfortunately it still has another problem with non-ascii characters
> in
> > > the
> > > > subject. It uses the server's encoding which is either good or bad.
> > >
> > > According to the RFC, the subject may not have such invalid characters
> > > without proper escaping.
> > >
> > > --- Noel
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: james-dev-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: james-dev-help@jakarta.apache.org
> >
>


----------------------------------------------------------------------------
----


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: james-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


RE: [PATCH] subject normalization in GenericListserv / 3

Posted by "Noel J. Bergman" <no...@devtech.com>.
ROFL!  Nice job, sir.  :-)   I am laughing because I was just going to post
to you almost word for word the same code for doing the JDK 1.4 detection:

	private static java.lang.reflect.Method getByAddress = null;

	static {
		try {
			Class inetAddressClass = Class.forName("java.net.InetAddress");
			Class[] parameterTypes = { byte[].class };
			getByAddress = inetAddressClass.getMethod("getByAddress",
parameterTypes);
		} catch (Exception e) {
			getByAddress = null;
		}
	}

That is in some code I'm going to post later tonight.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


[PATCH] subject normalization in GenericListserv / 3

Posted by Hontvari Jozsef <ho...@solware.com>.
I have attached a patch which ensures that the reflection is only executed
once, at the time of the class initialization.

It also enhances the heuristic a bit, it considers the case when the first
words were not encoded, only the next ones.


----- Original Message -----
From: "Noel J. Bergman" <no...@devtech.com>
To: "James Developers List" <ja...@jakarta.apache.org>
Sent: Tuesday, May 20, 2003 12:13 AM
Subject: RE: [PATCH] subject normalization in GenericListserv


> I've merged this patch into the CVS, both HEAD and version 2.
>
> Regarding your thoughts for the future:
>
> > enhancement ideas (just for the record):
> >   - better heuristic: if it cannot determine the original
> >     charset from the raw header, it can also look for the
> >     body charset if it is available.
>
> I would not presume to use the encoding from another body part.
>
> >   - (clarification) if the _decoded_ header value is ASCII
> >     only then the charset is definitely ASCII. However this
> >     doesn't mean that the new value will only contain ASCII
> >     chars (the prefix can contain anything). So some charset
> >     heuristic may be still useful.
>
> Unless we come up with a scheme to provide prefix encoding, you probably
> need to assume that the prefix is ASCII.
>
> >   - under java 1.4 java.nio the system can determine if the
> >     original charset fits or not (if there is untranslatable
> >     characters). If the charset doesn't fit the new value, it
> >     can fall back to UTF-8.
>
> OK, but would you look at something?  The code currently determines JDK
1.4
> or not at runtime for each operation.  Since it is, shall we say, "highly
> unlikely" that the JDK would change during execution, perhaps we should
> cache the JDK status once, and then use conditional logic?  Having to
> reflect each time, and catch the exception, is time consuming.
>
> FWIW, when compiling for release, I always compile the code with JDK 1.3.
I
> use JDK 1.4 for the javadocs because JDK 1.3 can't compile them, but the
> binaries are built with JDK 1.3 to ensure compatibility.
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: james-dev-help@jakarta.apache.org
>

RE: [PATCH] subject normalization in GenericListserv

Posted by "Noel J. Bergman" <no...@devtech.com>.
I've merged this patch into the CVS, both HEAD and version 2.

Regarding your thoughts for the future:

> enhancement ideas (just for the record):
>   - better heuristic: if it cannot determine the original
>     charset from the raw header, it can also look for the
>     body charset if it is available.

I would not presume to use the encoding from another body part.

>   - (clarification) if the _decoded_ header value is ASCII
>     only then the charset is definitely ASCII. However this
>     doesn't mean that the new value will only contain ASCII
>     chars (the prefix can contain anything). So some charset
>     heuristic may be still useful.

Unless we come up with a scheme to provide prefix encoding, you probably
need to assume that the prefix is ASCII.

>   - under java 1.4 java.nio the system can determine if the
>     original charset fits or not (if there is untranslatable
>     characters). If the charset doesn't fit the new value, it
>     can fall back to UTF-8.

OK, but would you look at something?  The code currently determines JDK 1.4
or not at runtime for each operation.  Since it is, shall we say, "highly
unlikely" that the JDK would change during execution, perhaps we should
cache the JDK status once, and then use conditional logic?  Having to
reflect each time, and catch the exception, is time consuming.

FWIW, when compiling for release, I always compile the code with JDK 1.3.  I
use JDK 1.4 for the javadocs because JDK 1.3 can't compile them, but the
binaries are built with JDK 1.3 to ensure compatibility.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


Re: [PATCH] subject normalization in GenericListserv

Posted by Hontvari Jozsef <ho...@solware.com>.
this new patch (attached) also includes my previous one line patch (which
adds the space after the bracket).

This patch attempts to prevent losing the charset info during the
normalization of the email subject. I think until all email client supports
UTF-8, it is impossible to do this correctly, but this patch should work
almost always in the practice. It takes the raw subject and attempts to
determine the original charset in which the sender encoded the header. If it
is successful then it uses the same charset when it encodes the amended
subject. If it cannot determine the original charset then the system uses
the server's default charset. In rare cases, when the text is non-ascii, but
the server JRE doesn't support the charset,  it falls back to UTF-8.

However, corresponding to the RFCs, James still never encodes an ASCII only
text.


----- Original Message -----
From: "Hontvari Jozsef" <ho...@solware.com>
To: "James Developers List" <ja...@jakarta.apache.org>
Sent: Sunday, May 18, 2003 10:52 PM
Subject: Re: [PATCH] GenericListserv.patch


> Hello Noel,
>
> you are right in that subject, but my path was related to a different
> feature, i.e. the autoBracket switch. The latter feature adds a bracket
pair
> the the supplied prefix (and with the patch it adds a space also). I am
> almost sure that this was the intent, because if you read the
> AvalonListserver javadoc, it is explicitly mentioned there. Indeed, if you
> dont apply the fix, then there is no way to add the space after the
bracket
> (if you are using autoBracket of course) :-). And of course I think the
> spaces is added everywhere. On the other hand you can add any prefix (with
> or without space) if you switch off auto-bracket, so nothing is lost (in
> contrast to the source in the diff you linked in).
>
> By the way, don't apply the patch because I have a better, which attempts
to
> solve the second problem.
>
> Regarding the non-latin characters I meant that the incoming mail is
> correctly encoded, escaped, etc., but the subject transformation loses the
> original charset and applies the server's default charset, which is not
> necessarily a good idea. E.g. an email with a Chinese subject will not
look
> too good after encoded with, let's say, latin-1, if that is the default on
> the server.
> I mean if you call s=getSubject() then setSubject(s) then you will do lose
> information. It is awful, but true.
>
>
> ----- Original Message -----
> From: "Noel J. Bergman" <no...@devtech.com>
> To: "James Developers List" <ja...@jakarta.apache.org>
> Sent: Sunday, May 18, 2003 10:26 PM
> Subject: RE: [PATCH] GenericListserv.patch
>
>
> > > This is a one line patch, it fixes the GenericListserv autoBracket
> > feature,
> > > now it generates "[xxx] " instead of "[xxx]" (notice the space), which
> was
> > > the original intent, I think.
> >
> > Thank you for the patch, but this patch is rejected.  We have, in fact,
> been
> > over this territory before, as you can see from the CVS:
> >
> >
> >
>
http://cvs.apache.org/viewcvs/jakarta-james/src/java/org/apache/james/transp
> > ort/mailets/GenericListserv.java.diff?r1=1.10&r2=1.11&diff_format=h
> >
> > The reason for rejecting the patch is that it is possible to include the
> > space character in the configuration, but not to remove one.  You simply
> > "xml:space="preserve" when defining the text.  Were we to accept this
> patch,
> > there would be no way to NOT have that space.
> >
> > > Unfortunately it still has another problem with non-ascii characters
in
> > the
> > > subject. It uses the server's encoding which is either good or bad.
> >
> > According to the RFC, the subject may not have such invalid characters
> > without proper escaping.
> >
> > --- Noel
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: james-dev-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: james-dev-help@jakarta.apache.org
>

RE: [PATCH] GenericListserv.patch

Posted by "Noel J. Bergman" <no...@devtech.com>.
> you are right in that subject, but my path was related to a different
> feature

Fair enough.  :-)

>. the autoBracket switch. The latter feature adds a bracket pair
> the the supplied prefix (and with the patch it adds a space also). I am
> almost sure that this was the intent, because if you read the
> AvalonListserver javadoc, it is explicitly mentioned there.

Historically, the mailing list manager always autobracketed the prefix, and
included a space.  The autoBracket patch allowed the adminstrator to disable
that feature.  The ' ' was never part of the prefix, but was implicit in the
code referenced earlier.  And there was no way to remove it (hence the
change).

I agree with you that a ' ' could be added after the ']' in the generated
prefix without breaking the ability to not have it.  Sorry.  I did not
realize that was the particular of the change you were proposing.

> By the way, don't apply the patch because I have a better, which attempts
to
> solve the second problem.

OK, I'll wait.  :-)

> Regarding the non-latin characters I meant that the incoming mail is
> correctly encoded, escaped, etc., but the subject transformation
> loses the original charset and applies the server's default charset, which
is not
> necessarily a good idea.

Ah!  OK.  :-)  I look forward to your fix of that problem.  :-)

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


Re: [PATCH] GenericListserv.patch

Posted by Hontvari Jozsef <ho...@solware.com>.
Hello Noel,

you are right in that subject, but my path was related to a different
feature, i.e. the autoBracket switch. The latter feature adds a bracket pair
the the supplied prefix (and with the patch it adds a space also). I am
almost sure that this was the intent, because if you read the
AvalonListserver javadoc, it is explicitly mentioned there. Indeed, if you
dont apply the fix, then there is no way to add the space after the bracket
(if you are using autoBracket of course) :-). And of course I think the
spaces is added everywhere. On the other hand you can add any prefix (with
or without space) if you switch off auto-bracket, so nothing is lost (in
contrast to the source in the diff you linked in).

By the way, don't apply the patch because I have a better, which attempts to
solve the second problem.

Regarding the non-latin characters I meant that the incoming mail is
correctly encoded, escaped, etc., but the subject transformation loses the
original charset and applies the server's default charset, which is not
necessarily a good idea. E.g. an email with a Chinese subject will not look
too good after encoded with, let's say, latin-1, if that is the default on
the server.
I mean if you call s=getSubject() then setSubject(s) then you will do lose
information. It is awful, but true.


----- Original Message -----
From: "Noel J. Bergman" <no...@devtech.com>
To: "James Developers List" <ja...@jakarta.apache.org>
Sent: Sunday, May 18, 2003 10:26 PM
Subject: RE: [PATCH] GenericListserv.patch


> > This is a one line patch, it fixes the GenericListserv autoBracket
> feature,
> > now it generates "[xxx] " instead of "[xxx]" (notice the space), which
was
> > the original intent, I think.
>
> Thank you for the patch, but this patch is rejected.  We have, in fact,
been
> over this territory before, as you can see from the CVS:
>
>
>
http://cvs.apache.org/viewcvs/jakarta-james/src/java/org/apache/james/transp
> ort/mailets/GenericListserv.java.diff?r1=1.10&r2=1.11&diff_format=h
>
> The reason for rejecting the patch is that it is possible to include the
> space character in the configuration, but not to remove one.  You simply
> "xml:space="preserve" when defining the text.  Were we to accept this
patch,
> there would be no way to NOT have that space.
>
> > Unfortunately it still has another problem with non-ascii characters in
> the
> > subject. It uses the server's encoding which is either good or bad.
>
> According to the RFC, the subject may not have such invalid characters
> without proper escaping.
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: james-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


RE: [PATCH] GenericListserv.patch

Posted by "Noel J. Bergman" <no...@devtech.com>.
> > I support the addition of  a space character after the closing ]
> > It makes the subject neater and I can't see any reason why it
> > should be contentious at all.

> I agree with Danny.  It's a standard convention to have "[blah] "

So do I.  I hadn't looked carefully enough at where he was making the
change, and as I replied to him earlier, I have no problem making the
change, although Hontvari apparently has a new patch coming.

> while it does prevent people from using "{foobar}" (no space, and not
> []

No problem.  This is just default behavior.  If they turn off autoBracket,
they can use whatever they want, with or without spaces.  All of my own
lists have the ' '.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


Re: [PATCH] GenericListserv.patch

Posted by Serge Knystautas <se...@lokitech.com>.
Danny Angus wrote:
> I support the addition of  a space character after the closing ] 
> It makes the subject neater and I can't see any reason why it should be contentious at all.

I agree with Danny.  It's a standard convention to have "[blah] ", and 
while it does prevent people from using "{foobar}" (no space, and not 
[], I don't see a reason to support that.

-- 
Serge Knystautas
President
Lokitech >> software . strategy . design >> http://www.lokitech.com/
p. 1.301.656.5501
e. sergek@lokitech.com



---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


RE: [PATCH] GenericListserv.patch

Posted by Danny Angus <da...@apache.org>.
I support the addition of  a space character after the closing ] 
It makes the subject neater and I can't see any reason why it should be contentious at all.

d.

> -----Original Message-----
> From: Noel J. Bergman [mailto:noel@devtech.com]
> Sent: 18 May 2003 21:27
> To: James Developers List
> Subject: RE: [PATCH] GenericListserv.patch
> 
> 
> > This is a one line patch, it fixes the GenericListserv autoBracket
> feature,
> > now it generates "[xxx] " instead of "[xxx]" (notice the 
> space), which was
> > the original intent, I think.
> 
> Thank you for the patch, but this patch is rejected.  We have, in 
> fact, been
> over this territory before, as you can see from the CVS:
> 
> 
> http://cvs.apache.org/viewcvs/jakarta-james/src/java/org/apache/ja
> mes/transp
> ort/mailets/GenericListserv.java.diff?r1=1.10&r2=1.11&diff_format=h
> 
> The reason for rejecting the patch is that it is possible to include the
> space character in the configuration, but not to remove one.  You simply
> "xml:space="preserve" when defining the text.  Were we to accept 
> this patch,
> there would be no way to NOT have that space.
> 
> > Unfortunately it still has another problem with non-ascii characters in
> the
> > subject. It uses the server's encoding which is either good or bad.
> 
> According to the RFC, the subject may not have such invalid characters
> without proper escaping.
> 
> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: james-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org


RE: [PATCH] GenericListserv.patch

Posted by "Noel J. Bergman" <no...@devtech.com>.
> This is a one line patch, it fixes the GenericListserv autoBracket
feature,
> now it generates "[xxx] " instead of "[xxx]" (notice the space), which was
> the original intent, I think.

Thank you for the patch, but this patch is rejected.  We have, in fact, been
over this territory before, as you can see from the CVS:


http://cvs.apache.org/viewcvs/jakarta-james/src/java/org/apache/james/transp
ort/mailets/GenericListserv.java.diff?r1=1.10&r2=1.11&diff_format=h

The reason for rejecting the patch is that it is possible to include the
space character in the configuration, but not to remove one.  You simply
"xml:space="preserve" when defining the text.  Were we to accept this patch,
there would be no way to NOT have that space.

> Unfortunately it still has another problem with non-ascii characters in
the
> subject. It uses the server's encoding which is either good or bad.

According to the RFC, the subject may not have such invalid characters
without proper escaping.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: james-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: james-dev-help@jakarta.apache.org