You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Daniel John Debrunner <dj...@apache.org> on 2006/07/28 00:06:09 UTC

Encoding for test master files.

I have a new version of my old ijRunner JUnit test class that runs ij
scripts within JUnit and compares the output etc. I'll commit in a short
while, though it won't be in use yet, it's still a work in progress.

One are I need to address is the encoding of the output, what is the
default encoding for the master files? I think looking at DERBY-683 work
was done to allow test specific encoding, but I'm just current looking
at handling the default.

Any ideas?
Dan.




Re: Encoding for test master files.

Posted by Daniel John Debrunner <dj...@apache.org>.
Myrna van Lunteren wrote:

> On 7/27/06, Daniel John Debrunner <dj...@apache.org> wrote:
> 
>> I have a new version of my old ijRunner JUnit test class that runs ij
>> scripts within JUnit and compares the output etc. I'll commit in a short
>> while, though it won't be in use yet, it's still a work in progress.
>>
>> One are I need to address is the encoding of the output, what is the
>> default encoding for the master files? I think looking at DERBY-683 work
>> was done to allow test specific encoding, but I'm just current looking
>> at handling the default.
>>
>> Any ideas?
>> Dan.
>>
>>
>>
>>
> I am getting terribly confused on this as I'm trying to answer, so bear
> with me.
> I think the answer is UTF-8.
> DERBY-244 is relevant in this area and maybe DERBY-658.
> 
> With DERBY-658 we made the tests create output in local encoding. We
> read in the master in UTF-8 but have the test convert this to a local
> encoded copy in the .tmpmstr file, which is the one used in the
> diffing.
> So, I think this means the masters are expected to be UTF-8.
> 
> If there's a difference in a test creators' machine-dependent output
> and UTF-8 output, the harness has a flag to create a .utf8out file
> that's to be the actual master.
> For most tests and the more common systems, assuming we're running
> (and that's forced now in the harness) in US-English locale, the
> output with the system's default encoding (i.e. ISO-5889-1 on Unix,
> Cp1252 on Windwos) is close enough to UTF-8 that we don't need to
> worry.
> 
> I haven't looked at Knut Anders' patch for DERBY-244 in detail yet,
> planned to do it today but other things took longer - but looks like
> it's going to force i18n tests to UTF-8 encoding.
> 
> Does this help? Is there a specific test that causes concern?

Yes, this helps, I'll look at mimicing this.

No specific test, I just want to ensure the encoding (in the simple case
at least) is handled up front.

Thanks,
Dan.


Re: Encoding for test master files.

Posted by Myrna van Lunteren <m....@gmail.com>.
On 7/27/06, Daniel John Debrunner <dj...@apache.org> wrote:
> I have a new version of my old ijRunner JUnit test class that runs ij
> scripts within JUnit and compares the output etc. I'll commit in a short
> while, though it won't be in use yet, it's still a work in progress.
>
> One are I need to address is the encoding of the output, what is the
> default encoding for the master files? I think looking at DERBY-683 work
> was done to allow test specific encoding, but I'm just current looking
> at handling the default.
>
> Any ideas?
> Dan.
>
>
>
>
I am getting terribly confused on this as I'm trying to answer, so bear with me.
I think the answer is UTF-8.
DERBY-244 is relevant in this area and maybe DERBY-658.

With DERBY-658 we made the tests create output in local encoding. We
read in the master in UTF-8 but have the test convert this to a local
encoded copy in the .tmpmstr file, which is the one used in the
diffing.
So, I think this means the masters are expected to be UTF-8.

If there's a difference in a test creators' machine-dependent output
and UTF-8 output, the harness has a flag to create a .utf8out file
that's to be the actual master.
For most tests and the more common systems, assuming we're running
(and that's forced now in the harness) in US-English locale, the
output with the system's default encoding (i.e. ISO-5889-1 on Unix,
Cp1252 on Windwos) is close enough to UTF-8 that we don't need to
worry.

I haven't looked at Knut Anders' patch for DERBY-244 in detail yet,
planned to do it today but other things took longer - but looks like
it's going to force i18n tests to UTF-8 encoding.

Does this help? Is there a specific test that causes concern?

Myrna

Re: Encoding for test master files.

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Myrna van Lunteren <m....@gmail.com> writes:

> On 7/28/06, Knut Anders Hatlen <Kn...@sun.com> wrote:
>> All of the master files are 7 bit ASCII files, but they are read as
>> UTF-8 (which works since ASCII characters have the the same encoding
>> in UTF-8 as in US-ASCII). Some of the tests in i18n/* output non-ASCII
>> characters, but Sed.java replaces them with >EnC charcode<, so they
>> are also 7 bit ASCII.
>>
>> --
>> Knut Anders
>>
> Thx for correcting my earlier pronouncements...

Elaborating, not correcting. :)

I just wanted to emphasize that even though they sometimes are treated
as UTF-8, ISO-8859-1 or Cp1252, they in fact don't contain any
characters outside US-ASCII, and therefore it doesn't matter which of
these encodings we use.

-- 
Knut Anders

Re: Encoding for test master files.

Posted by Myrna van Lunteren <m....@gmail.com>.
On 7/28/06, Knut Anders Hatlen <Kn...@sun.com> wrote:
> All of the master files are 7 bit ASCII files, but they are read as
> UTF-8 (which works since ASCII characters have the the same encoding
> in UTF-8 as in US-ASCII). Some of the tests in i18n/* output non-ASCII
> characters, but Sed.java replaces them with >EnC charcode<, so they
> are also 7 bit ASCII.
>
> --
> Knut Anders
>
Thx for correcting my earlier pronouncements...
Myrna

Re: Encoding for test master files.

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Daniel John Debrunner <dj...@apache.org> writes:

> I have a new version of my old ijRunner JUnit test class that runs ij
> scripts within JUnit and compares the output etc. I'll commit in a short
> while, though it won't be in use yet, it's still a work in progress.
>
> One are I need to address is the encoding of the output, what is the
> default encoding for the master files? I think looking at DERBY-683 work
> was done to allow test specific encoding, but I'm just current looking
> at handling the default.

All of the master files are 7 bit ASCII files, but they are read as
UTF-8 (which works since ASCII characters have the the same encoding
in UTF-8 as in US-ASCII). Some of the tests in i18n/* output non-ASCII
characters, but Sed.java replaces them with >EnC charcode<, so they
are also 7 bit ASCII.

-- 
Knut Anders