You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sebastian Schelter <ss...@apache.org> on 2010/08/18 00:18:20 UTC

Test failure

The tests
org.apache.mahout.vectors.WordLikeValueEncoderTest.testAsString() and
org.apache.mahout.vectors.TextValueEncoderTest.testAsString() break for
me. They expect a dot when formatting but the code produces a comma, I
think that's caused by a call to String.format() somewhere which might
be locale-specific.

org.junit.ComparisonFailure:
Expected :word:w1:1.0000
Actual   :word:w1:1,0000

--sebastian

Re: Test failure

Posted by Ted Dunning <te...@gmail.com>.
Fine by me.

On Mon, Aug 30, 2010 at 7:04 AM, Sean Owen <sr...@gmail.com> wrote:

> So you mind if I force Locale.ENGLISH here for the output?
>
> On Wed, Aug 25, 2010 at 8:38 AM, Isabel Drost <is...@apache.org> wrote:
> > On Thu, 19 Aug 2010 Ted Dunning <te...@gmail.com> wrote:
> >> Sean has one more point in his favor.
> >>
> >> Isabel?  Olivier?  Any other Europeans?
> >
> > All my machines are set to an English locale anyway, so no problem
> > there. I'd rather opt for consistency of program output than
> > localisation when it comes to log messages and such: For me it's
> > kind-of annoying when analysis scripts stop working just because
> > someone switched the locale setting.
> >
> > An area where localisation makes perfect sense though is in user-facing
> > front-end parts of the software: So happy that I could give my mom a
> > machine running Ubuntu that has most user interfaces translated to
> > German ;)
> >
> >
> > Cheers,
> > Isabel
> >
> >
> > PS: Sorry for the late reply - was at FrOSCon last weekend.
> >
> >
>

Re: Test failure

Posted by Sean Owen <sr...@gmail.com>.
So you mind if I force Locale.ENGLISH here for the output?

On Wed, Aug 25, 2010 at 8:38 AM, Isabel Drost <is...@apache.org> wrote:
> On Thu, 19 Aug 2010 Ted Dunning <te...@gmail.com> wrote:
>> Sean has one more point in his favor.
>>
>> Isabel?  Olivier?  Any other Europeans?
>
> All my machines are set to an English locale anyway, so no problem
> there. I'd rather opt for consistency of program output than
> localisation when it comes to log messages and such: For me it's
> kind-of annoying when analysis scripts stop working just because
> someone switched the locale setting.
>
> An area where localisation makes perfect sense though is in user-facing
> front-end parts of the software: So happy that I could give my mom a
> machine running Ubuntu that has most user interfaces translated to
> German ;)
>
>
> Cheers,
> Isabel
>
>
> PS: Sorry for the late reply - was at FrOSCon last weekend.
>
>

Re: Test failure

Posted by Isabel Drost <is...@apache.org>.
On Thu, 19 Aug 2010 Ted Dunning <te...@gmail.com> wrote:
> Sean has one more point in his favor.
> 
> Isabel?  Olivier?  Any other Europeans?

All my machines are set to an English locale anyway, so no problem
there. I'd rather opt for consistency of program output than
localisation when it comes to log messages and such: For me it's
kind-of annoying when analysis scripts stop working just because
someone switched the locale setting.

An area where localisation makes perfect sense though is in user-facing
front-end parts of the software: So happy that I could give my mom a
machine running Ubuntu that has most user interfaces translated to
German ;)


Cheers,
Isabel


PS: Sorry for the late reply - was at FrOSCon last weekend.


Re: Test failure

Posted by Ted Dunning <te...@gmail.com>.
Sean has one more point in his favor.

Isabel?  Olivier?  Any other Europeans?

On Thu, Aug 19, 2010 at 10:04 AM, Sebastian Schelter <ss...@apache.org> wrote:

> I'd be fine with having Locale.ENGLISH as default.
>

Re: Test failure

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Ted,

A point instead of a comma as decimal separator is perfectly ok in my
eyes as it is also the default if you use toString(). I'd say it's also
what people would expect here in Germany. I did some testing and I
couldn't see a specific thousands separator, in which cases do I see it?

I ran the following tests and the only difference I saw was the comma
when using %f for formatting the number.

System.out.println(10000.987);   
System.out.println(String.format(Locale.GERMAN, "%f", 10000.987));
System.out.println(String.format(Locale.ENGLISH, "%f", 10000.987));
System.out.println(String.format(Locale.GERMAN, "%e", 10000.987));
System.out.println(String.format(Locale.ENGLISH, "%e", 10000.987));
System.out.println(String.format(Locale.GERMAN, "%g", 10000.987));
System.out.println(String.format(Locale.ENGLISH, "%g", 10000.987));  

10000.987
10000,987000
10000.987000
1.000099e+04
1.000099e+04
10001.0
10001.0

I'd be fine with having Locale.ENGLISH as default.

--sebastian


Am 19.08.2010 17:34, schrieb Ted Dunning:
> Sebastian, Olivier, Isabel, I will need your input at the end of this email
> as non-
> English users of Mahout.
>
> Sean,
>
> On Wed, Aug 18, 2010 at 3:38 PM, Sean Owen <sr...@gmail.com> wrote:
>
>   
>> I think we might be battling inadvertently in SVN -- I think you undid
>> my last change and a bit more. Not a problem per se but let's discuss
>> what the thing to do is.
>>
>>     
> Thanks for popping up.  The other issue was fat-fingering on my part due
> to inexperience with git connected to svn.
>
> We do have a bit of a disagreement, however.  I think we can resolve it
> easily.
>
>
>   
>> I think the output of the jobs ought to not depend on the Locale.
>> While it may be "internally consistent" to use the system default
>> Locale in all instances, since input and output will match for one
>> user, it means that it won't match when two users trade data and are
>> using different Locales.
>>     
>
> I agree half of the time.
>
> The output of the system as intended to be consumed by machines should be,
> as you say, invariant across locale changes.
>
> On the other hand, output of the system as intended to be consumed by humans
> should be locale specific.
>
> I went a bit overboard in the name of consistency and made some *log*
>   
>> statements fixed at Locale.ENGLISH for consistency (and removed use of
>> String.format() where it didn't do anything beyond what Logger does).
>> That I don't mind un-doing.
>>
>>     
> Log statements are kind of borderline to me.  They are often read by humans
> and by machines.  I would lean, barely, toward the machine, invariant style
> here.
>
> Sebastian, Olivier and Isabel,
>
> What do you think about programs that display data for you in the wrong
> locale,
> especially with regard to decimal point and thousands separators?
>
> Does it cause you to make errors in interpreting the data?  Or does it
> happen
> so often that you correct what you read without noticing?
>
>   


Re: Test failure

Posted by Ted Dunning <te...@gmail.com>.
Sebastian, Olivier, Isabel, I will need your input at the end of this email
as non-
English users of Mahout.

Sean,

On Wed, Aug 18, 2010 at 3:38 PM, Sean Owen <sr...@gmail.com> wrote:

> I think we might be battling inadvertently in SVN -- I think you undid
> my last change and a bit more. Not a problem per se but let's discuss
> what the thing to do is.
>

Thanks for popping up.  The other issue was fat-fingering on my part due
to inexperience with git connected to svn.

We do have a bit of a disagreement, however.  I think we can resolve it
easily.


> I think the output of the jobs ought to not depend on the Locale.
> While it may be "internally consistent" to use the system default
> Locale in all instances, since input and output will match for one
> user, it means that it won't match when two users trade data and are
> using different Locales.


I agree half of the time.

The output of the system as intended to be consumed by machines should be,
as you say, invariant across locale changes.

On the other hand, output of the system as intended to be consumed by humans
should be locale specific.

I went a bit overboard in the name of consistency and made some *log*
> statements fixed at Locale.ENGLISH for consistency (and removed use of
> String.format() where it didn't do anything beyond what Logger does).
> That I don't mind un-doing.
>

Log statements are kind of borderline to me.  They are often read by humans
and by machines.  I would lean, barely, toward the machine, invariant style
here.

Sebastian, Olivier and Isabel,

What do you think about programs that display data for you in the wrong
locale,
especially with regard to decimal point and thousands separators?

Does it cause you to make errors in interpreting the data?  Or does it
happen
so often that you correct what you read without noticing?

Re: Test failure

Posted by Sean Owen <sr...@gmail.com>.
I think we might be battling inadvertently in SVN -- I think you undid
my last change and a bit more. Not a problem per se but let's discuss
what the thing to do is.

I think the output of the jobs ought to not depend on the Locale.
While it may be "internally consistent" to use the system default
Locale in all instances, since input and output will match for one
user, it means that it won't match when two users trade data and are
using different Locales. This seems Surprising, and was the nature of
the original 'bug' we were trying to fix, where the tests failed on
golden data for this reason.

I went a bit overboard in the name of consistency and made some *log*
statements fixed at Locale.ENGLISH for consistency (and removed use of
String.format() where it didn't do anything beyond what Logger does).
That I don't mind un-doing.

But I think we really want the bits that serialize data as a String to
be deterministic?

On Wed, Aug 18, 2010 at 2:23 PM, Ted Dunning <te...@gmail.com> wrote:
> I was just grabbing for a nice big hammer that I could be sure would work.
>  Happy to back it off a bit.
>
> On Tue, Aug 17, 2010 at 11:51 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> Where this had happened before we just used the "Locale.ENGLISH"
>> locale as a slightly more neutral alternative. Is that OK?
>>
>> On Wed, Aug 18, 2010 at 12:07 AM, Sebastian Schelter <ss...@apache.org>
>> wrote:
>> > Yes, everything's fine now, thanks for the quick response.
>> >
>> > Am 18.08.2010 00:54, schrieb Ted Dunning:
>> >> Sebastian,
>> >>
>> >> I just committed a fix.  I tested it by setting the locale to DE.de and
>> got
>> >> the expected failure.  That implies that
>> >> the fix should make things good for you.
>>
>

Re: Test failure

Posted by Ted Dunning <te...@gmail.com>.
I was just grabbing for a nice big hammer that I could be sure would work.
 Happy to back it off a bit.

On Tue, Aug 17, 2010 at 11:51 PM, Sean Owen <sr...@gmail.com> wrote:

> Where this had happened before we just used the "Locale.ENGLISH"
> locale as a slightly more neutral alternative. Is that OK?
>
> On Wed, Aug 18, 2010 at 12:07 AM, Sebastian Schelter <ss...@apache.org>
> wrote:
> > Yes, everything's fine now, thanks for the quick response.
> >
> > Am 18.08.2010 00:54, schrieb Ted Dunning:
> >> Sebastian,
> >>
> >> I just committed a fix.  I tested it by setting the locale to DE.de and
> got
> >> the expected failure.  That implies that
> >> the fix should make things good for you.
>

Re: Test failure

Posted by Sean Owen <sr...@gmail.com>.
Where this had happened before we just used the "Locale.ENGLISH"
locale as a slightly more neutral alternative. Is that OK?

On Wed, Aug 18, 2010 at 12:07 AM, Sebastian Schelter <ss...@apache.org> wrote:
> Yes, everything's fine now, thanks for the quick response.
>
> Am 18.08.2010 00:54, schrieb Ted Dunning:
>> Sebastian,
>>
>> I just committed a fix.  I tested it by setting the locale to DE.de and got
>> the expected failure.  That implies that
>> the fix should make things good for you.

Re: Test failure

Posted by Sebastian Schelter <ss...@apache.org>.
Yes, everything's fine now, thanks for the quick response.

Am 18.08.2010 00:54, schrieb Ted Dunning:
> Sebastian,
>
> I just committed a fix.  I tested it by setting the locale to DE.de and got
> the expected failure.  That implies that
> the fix should make things good for you.
>
> On Tue, Aug 17, 2010 at 3:24 PM, Ted Dunning <te...@gmail.com> wrote:
>
>   
>> Ahh... yes.
>>
>> Let me see if I can't get rid of that dependency.  Thanks so much for
>> catching it.
>>
>>
>> On Tue, Aug 17, 2010 at 3:18 PM, Sebastian Schelter <ss...@apache.org>wrote:
>>
>>     
>>> The tests
>>> org.apache.mahout.vectors.WordLikeValueEncoderTest.testAsString() and
>>> org.apache.mahout.vectors.TextValueEncoderTest.testAsString() break for
>>> me. They expect a dot when formatting but the code produces a comma, I
>>> think that's caused by a call to String.format() somewhere which might
>>> be locale-specific.
>>>
>>> org.junit.ComparisonFailure:
>>> Expected :word:w1:1.0000
>>> Actual   :word:w1:1,0000
>>>
>>> --sebastian
>>>
>>>       
>>
>>     
>   


Re: Test failure

Posted by Ted Dunning <te...@gmail.com>.
Sebastian,

I just committed a fix.  I tested it by setting the locale to DE.de and got
the expected failure.  That implies that
the fix should make things good for you.

On Tue, Aug 17, 2010 at 3:24 PM, Ted Dunning <te...@gmail.com> wrote:

>
> Ahh... yes.
>
> Let me see if I can't get rid of that dependency.  Thanks so much for
> catching it.
>
>
> On Tue, Aug 17, 2010 at 3:18 PM, Sebastian Schelter <ss...@apache.org>wrote:
>
>> The tests
>> org.apache.mahout.vectors.WordLikeValueEncoderTest.testAsString() and
>> org.apache.mahout.vectors.TextValueEncoderTest.testAsString() break for
>> me. They expect a dot when formatting but the code produces a comma, I
>> think that's caused by a call to String.format() somewhere which might
>> be locale-specific.
>>
>> org.junit.ComparisonFailure:
>> Expected :word:w1:1.0000
>> Actual   :word:w1:1,0000
>>
>> --sebastian
>>
>
>

Re: Test failure

Posted by Ted Dunning <te...@gmail.com>.
Ahh... yes.

Let me see if I can't get rid of that dependency.  Thanks so much for
catching it.

On Tue, Aug 17, 2010 at 3:18 PM, Sebastian Schelter <ss...@apache.org> wrote:

> The tests
> org.apache.mahout.vectors.WordLikeValueEncoderTest.testAsString() and
> org.apache.mahout.vectors.TextValueEncoderTest.testAsString() break for
> me. They expect a dot when formatting but the code produces a comma, I
> think that's caused by a call to String.format() somewhere which might
> be locale-specific.
>
> org.junit.ComparisonFailure:
> Expected :word:w1:1.0000
> Actual   :word:w1:1,0000
>
> --sebastian
>