You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Juan Orellana <Ju...@nordicnetproducts.se> on 2016/03/23 11:03:39 UTC

Weird behaviour on char boxing

I recently downloaded the latest 4.5 from github https://github.com/apache/lucenenet/ and started playing around with lucene.
When I ran some of the test y noticed a weird behavior with RandomlyRecaseCodePoints method on the TestUtil class “TestUtil.cs”.
The test seems to generate random text and sometimes y got weird behavior with some special string that may be invalid strings.

The error seems to on these lines

case 0:
                             builder.Append(char.ToUpper((char)codePoint));
                             break;

case 1:
                             builder.Append(char.ToLower((char)codePoint));
                             break;

case 2: // leave intact
                             builder.Append((char)codePoint);
                             break;

the (char)codePoint seems to truncate the integer codepoint so you get the wrong result back and the test fails because the length of the txt is not the same.
I don’t get this behavior when y run the same text with the java version of Lucene (RandomlyRecaseCodePoints).

I made a quick fix and this code seems to fix the problem but I haven’t tested it completely.

var stringValue = char.ConvertFromUtf32(codePoint);

switch (NextInt(random, 0, 2))
{
                             case 0:
                                                          var value0 = stringValue.ToUpper();
                                                          builder.Append(value0);
                                                          break;

                             case 1:
                                                          var value1 = stringValue.ToUpper().ToLower();
                                                          builder.Append(value1);
                                                          break;

                             case 2: // leave intact
                                                          builder.Append(stringValue);
                                                          break;
}

The text y got when running the test was hex F2 BA 81 B2 20
I made a bin file and added those hex number with a hexeditor was the only way to repeatable test the same “incorrect” string.
(I attached the file y used on this mail “failedString.bin”)
Then y read the text File.ReadAllText with Linqpad and tested the RandomlyRecaseCodePoints method with the string.

Has anyone else noticed this problem ??

Juan Orellana
System developer

Gustavslundsvägen 12
+46 (0)8 566 229 942
juan.orellana@nordicnet.se

NORDIC NETPRODUCTS AB
Box 14113, SE-167 14 Bromma
+46 (0)8 566 229 00
www.nordicnet.se | www.largestcompanies.se