You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2014/01/04 06:54:50 UTC

[jira] [Commented] (SOLR-5605) MapReduceIndexerTool fails in some locales -- seen in random failures of MapReduceIndexerToolArgumentParserTest

    [ https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862220#comment-13862220 ] 

Hoss Man commented on SOLR-5605:
--------------------------------

Filling this because i encountered it in randomized testing, it sounded familiar, but i was suprised not to be able to find an issue about it.

easy to repro...

{code}
ant test -Dtestcase=MapReduceIndexerToolArgumentParserTest -Dtests.method=testArgsParserHelp -Dtests.slow=true -Dtests.locale=hi_IN -Dtests.file.encoding=UTF-8
...
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=MapReduceIndexerToolArgumentParserTest -Dtests.method=testArgsParserHelp -Dtests.seed=90EEAEBDB08626A8 -Dtests.slow=true -Dtests.locale=hi_IN -Dtests.timezone=Pacific/Apia -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.25s | MapReduceIndexerToolArgumentParserTest.testArgsParserHelp <<<
   [junit4]    > Throwable #1: java.util.UnknownFormatConversionException: Conversion = '१'
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([90EEAEBDB08626A8:C3C04CAF7E84AE5]:0)
   [junit4]    > 	at java.util.Formatter.checkText(Formatter.java:2547)
   [junit4]    > 	at java.util.Formatter.parse(Formatter.java:2523)
   [junit4]    > 	at java.util.Formatter.format(Formatter.java:2469)
   [junit4]    > 	at java.io.PrintWriter.format(PrintWriter.java:905)
   [junit4]    > 	at net.sourceforge.argparse4j.helper.TextHelper.printHelp(TextHelper.java:206)
   [junit4]    > 	at net.sourceforge.argparse4j.internal.ArgumentImpl.printHelp(ArgumentImpl.java:247)
   [junit4]    > 	at net.sourceforge.argparse4j.internal.ArgumentParserImpl.printArgumentHelp(ArgumentParserImpl.java:253)
   [junit4]    > 	at net.sourceforge.argparse4j.internal.ArgumentParserImpl.printHelp(ArgumentParserImpl.java:279)
   [junit4]    > 	at org.apache.solr.hadoop.MapReduceIndexerTool$MyArgumentParser$1.run(MapReduceIndexerTool.java:187)
{code}

Analysis from Uwe on the list when jenkins hit this a while back...

{quote}
Locale problem with the argument parser.

The sperm-like symbol (१) is DEVANAGARI DIGIT ONE (U+0967). It looks like while testing some foreign (non-lucene) code converts the digit "1" to this small creature maybe through the use of default locale. As the Lucene code is forbidden-api checked, this seems to be a bug somewhere else - the stack trace shows the bug: net.sourceforge.argparse4j.helper.TextHelper calls String.format without Locale!). 
{quote}

...and...

{quote}
The problem is in Argparser4J:

http://grepcode.com/file/repo1.maven.org/maven2/net.sourceforge.argparse4j/argparse4j/0.3.2/net/sourceforge/argparse4j/helper/TextHelper.java#197

The code does the following:

String fmt = String.format("%%%ds%%s\n", indentWidth);
writer.format(fmt,....)

So it uses the first String.format (without locale) to produce the format string of the second one. The %d will be the indentWidth, so the code is right-aligned. But the indent-with pattern is formatted using default locale, so the first line produces something like the following code:
"%१s%s" instead of "%1s%s"

This will fail format parsing in the second. In my opinion the whole code is a bug by itself. Creating a format pattern with another format pattern is slow and as shown: buggy!

{quote}

> MapReduceIndexerTool fails in some locales -- seen in random failures of MapReduceIndexerToolArgumentParserTest
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5605
>                 URL: https://issues.apache.org/jira/browse/SOLR-5605
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> I noticed a randomized failure in MapReduceIndexerToolArgumentParserTest which is reproducible with any seed -- all that matters is the locale.
> The problem sounded familiar, and a quick search verified that jenkins has in fact hit this a couple of times in the past -- Uwe commented on the list that this is due to a real problem in one of the third-party dependencies (that does the argument parsing) that will affect usage on some systems.
> If working around the bug in the arg parsing lib isn't feasible, MapReduceIndexerTool should fail cleanly if the locale isn't one we know is "supported"



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org