You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by robert burrell donkin <ro...@gmail.com> on 2007/05/07 08:42:39 UTC

[jSieve] Script Encoding [WAS Re: i am not getting subject content in utf-8 format]

On 5/7/07, robert burrell donkin <ro...@gmail.com> wrote:
> On 5/7/07, ketanbparekh <ta...@yahoo.com> wrote:
> >
> > I am running on Windows XP Professional.
>
> windows has a difficult default platform encoding so this may well be
> the problem

i've taken a look at the code in SieveToMultiMailbox and SieveFactory.
i think that we have encoding issues. the current code will use  the
default platform encoding. when using windoz, this will result in
UFT-8 and UFT-16 encoded files being decoded incorrectly when (some)
non-ASCII characters are present.

to fix these issues, an encoding charset needs to be specified

i can think of a couple of options (hopefully people will jump in with
any i've missed):

1 JAMES should support a single, hard coded charset (probably UFT-8)

2 we allow charset to be injected through configuration; defaulting to:
  2a UFT-8
  2b platform

opinions?

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [jSieve] Script Encoding [WAS Re: i am not getting subject content in utf-8 format]

Posted by robert burrell donkin <ro...@gmail.com>.
On 5/8/07, sbrewin@synergy.demon.co.uk <sb...@synergy.demon.co.uk> wrote:
> norman@apache.org wrote:
> > robert burrell donkin schrieb:
> > > On 5/7/07, robert burrell donkin <ro...@gmail.com> wrote:
> > >> On 5/7/07, ketanbparekh <ta...@yahoo.com> wrote:
> > >> >
> > >> > I am running on Windows XP Professional.
> > >>
> > >> windows has a difficult default platform encoding so this may well be
> > >> the problem
> > >
> > > i've taken a look at the code in SieveToMultiMailbox and SieveFactory.
> > > i think that we have encoding issues. the current code will use  the
> > > default platform encoding. when using windoz, this will result in
> > > UFT-8 and UFT-16 encoded files being decoded incorrectly when (some)
> > > non-ASCII characters are present.
>
> Yes, I rather suspected this :(
>
> > > to fix these issues, an encoding charset needs to be specified
> > >
> > > i can think of a couple of options (hopefully people will jump in with
> > > any i've missed):
> > >
> > > 1 JAMES should support a single, hard coded charset (probably UFT-8)
> > >
> > > 2 we allow charset to be injected through configuration; defaulting to:
> > >  2a UFT-8
> > >  2b platform
> > >
> > > opinions?
> > >
> > > - robert
> >
> > Hi Robert,
> >
> > I think UTF-8 as default i not a good choice because some OS not support
> > UTF-8 by default. Maybe
> > ISO-8859-1 is a better choice.. A configuration option whould be cool
> > too for sure ;-)
> >
> > bye
> > Norman
>
>
> As the spec. prescribes UTF-8, I rather think that this should be the default.

+1

(i should have checked the specification before posting)

> As we are running in a VM, that the undetlying OS does or doesn't support UTF-8 isn't an
> issue. The VM will. We just need to make sure this is what we specify when dealing with
> character set sensitive code.

unless anyone objects or beats me to it, i'll patch SieveFactory so
that UFT-8 is forced by default and add a FAQ about this

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [jSieve] Script Encoding [WAS Re: i am not getting subject content in utf-8 format]

Posted by sb...@synergy.demon.co.uk.
norman@apache.org wrote:
> robert burrell donkin schrieb:
> > On 5/7/07, robert burrell donkin <ro...@gmail.com> wrote:
> >> On 5/7/07, ketanbparekh <ta...@yahoo.com> wrote:
> >> >
> >> > I am running on Windows XP Professional.
> >>
> >> windows has a difficult default platform encoding so this may well be
> >> the problem
> >
> > i've taken a look at the code in SieveToMultiMailbox and SieveFactory.
> > i think that we have encoding issues. the current code will use  the
> > default platform encoding. when using windoz, this will result in
> > UFT-8 and UFT-16 encoded files being decoded incorrectly when (some)
> > non-ASCII characters are present.

Yes, I rather suspected this :(

> > to fix these issues, an encoding charset needs to be specified
> >
> > i can think of a couple of options (hopefully people will jump in with
> > any i've missed):
> >
> > 1 JAMES should support a single, hard coded charset (probably UFT-8)
> >
> > 2 we allow charset to be injected through configuration; defaulting to:
> >  2a UFT-8
> >  2b platform
> >
> > opinions?
> >
> > - robert 
> 
> Hi Robert,
> 
> I think UTF-8 as default i not a good choice because some OS not support
> UTF-8 by default. Maybe
> ISO-8859-1 is a better choice.. A configuration option whould be cool
> too for sure ;-)
> 
> bye
> Norman

 
As the spec. prescribes UTF-8, I rather think that this should be the default. As we are running in a VM, that the undetlying OS does or doesn't support UTF-8 isn't an issue. The VM will. We just need to make sure this is what we specify when dealing with character set sensitive code.

Cheers

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [jSieve] Script Encoding [WAS Re: i am not getting subject content in utf-8 format]

Posted by Norman Maurer <no...@apache.org>.
robert burrell donkin schrieb:
> On 5/7/07, robert burrell donkin <ro...@gmail.com> wrote:
>> On 5/7/07, ketanbparekh <ta...@yahoo.com> wrote:
>> >
>> > I am running on Windows XP Professional.
>>
>> windows has a difficult default platform encoding so this may well be
>> the problem
>
> i've taken a look at the code in SieveToMultiMailbox and SieveFactory.
> i think that we have encoding issues. the current code will use  the
> default platform encoding. when using windoz, this will result in
> UFT-8 and UFT-16 encoded files being decoded incorrectly when (some)
> non-ASCII characters are present.
>
> to fix these issues, an encoding charset needs to be specified
>
> i can think of a couple of options (hopefully people will jump in with
> any i've missed):
>
> 1 JAMES should support a single, hard coded charset (probably UFT-8)
>
> 2 we allow charset to be injected through configuration; defaulting to:
>  2a UFT-8
>  2b platform
>
> opinions?
>
> - robert 

Hi Robert,

I think UTF-8 as default i not a good choice because some OS not support
UTF-8 by default. Maybe
ISO-8859-1 is a better choice.. A configuration option whould be cool
too for sure ;-)

bye
Norman


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org