You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/05/17 16:25:44 UTC

[jira] Created: (LUCENE-2466) fix some more locale problems in lucene/solr

fix some more locale problems in lucene/solr
--------------------------------------------

                 Key: LUCENE-2466
                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Robert Muir


set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
ant clean test

We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2466.
---------------------------------

    Resolution: Fixed

I ran a few more locales, no more failures... I think we found the worst problems.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch, LUCENE-2466_thai_solr.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2466:
--------------------------------

    Attachment: LUCENE-2466_thai_solr.patch

attached is a patch with some modifications to Solr, adding missing Locale.US params etc, following Hoss Man's rule.

I am still nervous about DIH (i didnt touch it) but this makes all the tests pass under th_TH_TH.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch, LUCENE-2466_thai_solr.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2466:
--------------------------------

    Fix Version/s: 3.1
                   4.0

setting fix versions correctly here.

happy to backport this stuff to 1.4.1 if desired.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch, LUCENE-2466_thai_solr.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868358#action_12868358 ] 

Robert Muir commented on LUCENE-2466:
-------------------------------------

I talked to Hoss Man about some of these date problems, and he was of the opinion that for Solr, the Locale should never be used for date parsing/formatting (only standard UTC/Locale.US). So these are easy to fix.

But there is another problem, in this case the formats of floats themselves. Should they follow the same rule in Solr, or should localized numerics formats be supported? 

{noformat}
   [junit] Caused by: java.lang.NumberFormatException: For input string: "<some thai digits here>"
   [junit]     at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
   [junit]     at java.lang.Float.parseFloat(Float.java:422)
   [junit]     at org.apache.solr.util.NumberUtils.float2sortableStr(NumberUtils.java:79)
   [junit]     at org.apache.solr.schema.SortableFloatField.toInternal(SortableFloatField.java:49)
   [junit]     at org.apache.solr.schema.FieldType.createField(FieldType.java:236)
   [junit]     ... 38 more
   [junit] </result>)
{noformat}

Furthermore, what about DataImportHandlers use of some of the same DateMathParser stuff used in other places in Solr? It tends to use TimeZone.getDefault/Locale.getDefault... should this be changed?


> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2466:
--------------------------------

    Attachment: LUCENE-2466_lucene_thai.patch

attached patch fixes trunk for the thai locale.

doesnt need to be merged as the tests don't exist in 3x, i created this problem :(

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868380#action_12868380 ] 

Yonik Seeley commented on LUCENE-2466:
--------------------------------------

IMO, there's nothing in Solr that should depend on the system locale unless explicitly referenced or configured to do so. The defaults should certainly never do so.

Hoss pointed out this in DIH:
http://wiki.apache.org/solr/DataImportHandler#NumberFormatTransformer
At a minimum I think this should be changed in trunk to not default to the system locale.

Anyway, my communication will be limited over the next week starting tomorrow (Apache Lucene EuroCon)...
so here's my standing +1 to commit all changes that remove system locale defaults.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch, LUCENE-2466_thai_solr.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2466:
--------------------------------

    Attachment: LUCENE-2466.patch

attached is a patch that fixes the tests for solr, too.

* I added StrUtils.ROOT_LOCALE, but we could probably use Locale.ENGLISH just fine too, this is just me being nitpicky.
* commons-codec fixed this in their 1.4 release, so i upgraded to 1.4 (not in patch, obviously) so that DoubleMetaphoneFilter etc pass also.
* besides lowercasing, Solr uses uppercasing in a lot of places... in my opinion we should review why it is doing this.
* I didnt change SolrQueryParser, similar problems exist in Lucene's QueryParser (strange casing) and thats for another day.

Someone should review the Solr stuff, as I don't think i necessarily present the best solution but just indicate where the problems are.



> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466_coretests.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868264#action_12868264 ] 

Yonik Seeley commented on LUCENE-2466:
--------------------------------------

Awesome!  If we can get the tests to pass with these different locales, commit it!  When in doubt, we should not be sensitive to locale.

bq. I didnt change SolrQueryParser, similar problems exist in Lucene's QueryParser (strange casing)

The QP shouldn't currently be an issue for solr, we never set the flags to do lowercasing (I've always been against it as the right solution is field specific, not parser specific).


> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466_coretests.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2466:
--------------------------------

    Attachment: LUCENE-2466.patch

here is a cleaned up patch, using Locale.ENGLISH, that fixes the casing problems.

* Note the use of Locale.ENGLISH is not an affront to non-english users, it just forces consistent casing behavior and is already defined as a constant.

I plan to commit soon (trunk/stable), and then look at the unrelated separate failures for Thai:
set ANT_ARGS="-Dargs=-Duser.language=th -Duser.country=TH -Duser.variant=TH"

I suspect much of these failures are due to date handling.

We might want to devise a plan to help test this stuff, either let Hudson pick a different locale each night, maybe just from the "troublesome ones", and/or do something similar to the LocalizedTestCase in lucene (but this can cause tests to be very slow).


> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868308#action_12868308 ] 

Robert Muir commented on LUCENE-2466:
-------------------------------------

Committed 945245 (trunk) /945270 (3x) for the casing problems.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868321#action_12868321 ] 

Robert Muir commented on LUCENE-2466:
-------------------------------------

Committed revision 945274 for the lucene wildcard/regex tests.

I will look at the solr problems under this locale now, they probably need to be merged to 3x also.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868397#action_12868397 ] 

Robert Muir commented on LUCENE-2466:
-------------------------------------

Committed LUCENE-2466_thai_solr.patch 945343 (trunk) / 945353 (3x)

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466.patch, LUCENE-2466.patch, LUCENE-2466_coretests.patch, LUCENE-2466_lucene_thai.patch, LUCENE-2466_thai_solr.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2466) fix some more locale problems in lucene/solr

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2466:
--------------------------------

    Attachment: LUCENE-2466_coretests.patch

attached is a patch, then lucene core/contrib is ok.

But Solr has some failures that must be investigated.

If no one objects I would like to commit this first and backport, then investigate those.

> fix some more locale problems in lucene/solr
> --------------------------------------------
>
>                 Key: LUCENE-2466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2466
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-2466_coretests.patch
>
>
> set ANT_ARGS="-Dargs=-Duser.language=tr -Duser.country=TR"
> ant clean test
> We should make sure this works across all of lucene/solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org