You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "cxzl25 (via GitHub)" <gi...@apache.org> on 2024/01/26 14:29:35 UTC

[PR] ORC-634: Fix the json output for double NaN and infinite [orc]

cxzl25 opened a new pull request, #1770:
URL: https://github.com/apache/orc/pull/1770

   ### What changes were proposed in this pull request?
   The meta command of tools supports outputting NaN and infinite of Double type.
   
   ### Why are the changes needed?
   When ORC's double type data contains NaN or infinite, dump data cannot work properly, and outputting meta in json will also fail.
   
   ```java
   java.lang.IllegalArgumentException: Numeric values must be finite, but was NaN
   	at com.google.gson.stream.JsonWriter.value(JsonWriter.java:505)
   	at org.apache.orc.tools.PrintData.printValue(PrintData.java:140)
   	at org.apache.orc.tools.PrintData.printRow(PrintData.java:192)
   	at org.apache.orc.tools.PrintData.printJsonData(PrintData.java:215)
   	at org.apache.orc.tools.PrintData.main(PrintData.java:288)
   	at org.apache.orc.tools.FileDump.main(FileDump.java:129)
   	at org.apache.orc.tools.FileDump.main(FileDump.java:144)
   ```
   
   ```java
   Exception in thread "main" java.lang.IllegalStateException: Nesting problem.
   	at com.google.gson.stream.JsonWriter.beforeName(JsonWriter.java:648)
   	at com.google.gson.stream.JsonWriter.writeDeferredName(JsonWriter.java:408)
   	at com.google.gson.stream.JsonWriter.value(JsonWriter.java:424)
   	at org.apache.orc.tools.JsonFileDump.printJsonMetaData(JsonFileDump.java:229)
   	at org.apache.orc.tools.FileDump.main(FileDump.java:135)
   	at org.apache.orc.tools.Driver.main(Driver.java:124)
   ```
   
   ### How was this patch tested?
   add UT
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1770:
URL: https://github.com/apache/orc/pull/1770#discussion_r1477121689


##########
java/tools/src/test/org/apache/orc/tools/TestJsonFileDump.java:
##########
@@ -127,4 +132,38 @@ public void testJsonDump() throws Exception {
 
     TestFileDump.checkOutput(outputFilename, workDir + File.separator + outputFilename);
   }
+
+  @Test
+  public void testDoubleNaNAndInfinite() throws Exception {
+    TypeDescription schema = TypeDescription.fromString("struct<x:double>");
+    Writer writer = OrcFile.createWriter(testFilePath,
+        OrcFile.writerOptions(conf)
+            .fileSystem(fs)
+            .setSchema(schema));
+    VectorizedRowBatch batch = schema.createRowBatch();
+    DoubleColumnVector x = (DoubleColumnVector) batch.cols[0];
+    int row = batch.size++;
+    x.vector[row] = Double.NaN;
+    row = batch.size++;
+    x.vector[row] = Double.POSITIVE_INFINITY;
+    row = batch.size++;
+    x.vector[row] = 12.34D;
+    if (batch.size != 0) {
+      writer.addRowBatch(batch);
+    }
+    writer.close();
+
+    assertEquals(3, writer.getNumberOfRows());
+
+    PrintStream origOut = System.out;
+    ByteArrayOutputStream myOut = new ByteArrayOutputStream();
+
+    // replace stdout and run command
+    System.setOut(new PrintStream(myOut, false, StandardCharsets.UTF_8));
+    FileDump.main(new String[]{testFilePath.toString(), "-j"});
+    System.out.flush();
+    System.setOut(origOut);
+    String[] lines = myOut.toString(StandardCharsets.UTF_8).split("\n");
+    assertEquals("{\"fileName\":\"TestFileDump.testDump.orc\",\"fileVersion\":\"0.12\",\"writerVersion\":\"ORC_14\",\"softwareVersion\":\"ORC Java unknown\",\"numberOfRows\":3,\"compression\":\"ZSTD\",\"compressionBufferSize\":262144,\"schemaString\":\"struct<x:double>\",\"schema\":{\"columnId\":0,\"columnType\":\"STRUCT\",\"children\":{\"x\":{\"columnId\":1,\"columnType\":\"DOUBLE\"}}},\"calendar\":\"Julian/Gregorian\",\"stripeStatistics\":[{\"stripeNumber\":1,\"columnStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}]}],\"fileStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}],\"stripes\":[{\"stripeNumber\":1,\"stripeInformation\":{\"offset\":3,\"indexLength\":55,\"dataLength\":27,\"footerLength\":35,\"rowCount\":3},\"str
 eams\":[{\"columnId\":0,\"section\":\"ROW_INDEX\",\"startOffset\":3,\"length\":11},{\"columnId\":1,\"section\":\"ROW_INDEX\",\"startOffset\":14,\"length\":44},{\"columnId\":1,\"section\":\"DATA\",\"startOffset\":58,\"length\":27}],\"encodings\":[{\"columnId\":0,\"kind\":\"DIRECT\"},{\"columnId\":1,\"kind\":\"DIRECT\"}]}],\"fileLength\":286,\"rawDataSize\":36,\"paddingLength\":0,\"paddingRatio\":0.0,\"status\":\"OK\"}", lines[0]);

Review Comment:
   Ur, this seems to fail in some environment.
   
   ```
   [ERROR] Failures:
   [ERROR]   TestJsonFileDump.testDoubleNaNAndInfinite:167 expected: <{"fileName":"TestFileDump.testDump.orc","fileVersion":"0.12","writerVersion":"ORC_14","softwareVersion":"ORC Java unknown","numberOfRows":3,"compression":"ZSTD","compressionBufferSize":262144,"schemaString":"struct<x:double>","schema":{"columnId":0,"columnType":"STRUCT","children":{"x":{"columnId":1,"columnType":"DOUBLE"}}},"calendar":"Julian/Gregorian","stripeStatistics":[{"stripeNumber":1,"columnStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}]}],"fileStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}],"stripes":[{"stripeNumber":1,"stripeInformation":{"offset":3,"indexLength":55,"dataLength":27,"footerLength":35,"rowCount":3},"streams":[{"columnId":0,"section":"ROW_INDEX","startOffset":3,"length":11},
 {"columnId":1,"section":"ROW_INDEX","startOffset":14,"length":44},{"columnId":1,"section":"DATA","startOffset":58,"length":27}],"encodings":[{"columnId":0,"kind":"DIRECT"},{"columnId":1,"kind":"DIRECT"}]}],"fileLength":286,"rawDataSize":36,"paddingLength":0,"paddingRatio":0.0,"status":"OK"}> but was: <{"fileName":"TestFileDump.testDump.orc","fileVersion":"0.12","writerVersion":"ORC_14","softwareVersion":"ORC Java 2.1.0-SNAPSHOT","numberOfRows":3,"compression":"ZSTD","compressionBufferSize":262144,"schemaString":"struct<x:double>","schema":{"columnId":0,"columnType":"STRUCT","children":{"x":{"columnId":1,"columnType":"DOUBLE"}}},"calendar":"Julian/Gregorian","stripeStatistics":[{"stripeNumber":1,"columnStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}]}],"fileStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN
 ,"max":NaN,"sum":NaN,"type":"DOUBLE"}],"stripes":[{"stripeNumber":1,"stripeInformation":{"offset":3,"indexLength":55,"dataLength":27,"footerLength":35,"rowCount":3},"streams":[{"columnId":0,"section":"ROW_INDEX","startOffset":3,"length":11},{"columnId":1,"section":"ROW_INDEX","startOffset":14,"length":44},{"columnId":1,"section":"DATA","startOffset":58,"length":27}],"encodings":[{"columnId":0,"kind":"DIRECT"},{"columnId":1,"kind":"DIRECT"}]}],"fileLength":293,"rawDataSize":36,"paddingLength":0,"paddingRatio":0.0,"status":"OK"}>
   [INFO]
   [ERROR] Tests run: 51, Failures: 1, Errors: 0, Skipped: 0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #1770:
URL: https://github.com/apache/orc/pull/1770#issuecomment-1925455194

   I made a PR.
   - https://github.com/apache/orc/pull/1781


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "cxzl25 (via GitHub)" <gi...@apache.org>.
cxzl25 commented on code in PR #1770:
URL: https://github.com/apache/orc/pull/1770#discussion_r1477167440


##########
java/tools/src/test/org/apache/orc/tools/TestJsonFileDump.java:
##########
@@ -127,4 +132,38 @@ public void testJsonDump() throws Exception {
 
     TestFileDump.checkOutput(outputFilename, workDir + File.separator + outputFilename);
   }
+
+  @Test
+  public void testDoubleNaNAndInfinite() throws Exception {
+    TypeDescription schema = TypeDescription.fromString("struct<x:double>");
+    Writer writer = OrcFile.createWriter(testFilePath,
+        OrcFile.writerOptions(conf)
+            .fileSystem(fs)
+            .setSchema(schema));
+    VectorizedRowBatch batch = schema.createRowBatch();
+    DoubleColumnVector x = (DoubleColumnVector) batch.cols[0];
+    int row = batch.size++;
+    x.vector[row] = Double.NaN;
+    row = batch.size++;
+    x.vector[row] = Double.POSITIVE_INFINITY;
+    row = batch.size++;
+    x.vector[row] = 12.34D;
+    if (batch.size != 0) {
+      writer.addRowBatch(batch);
+    }
+    writer.close();
+
+    assertEquals(3, writer.getNumberOfRows());
+
+    PrintStream origOut = System.out;
+    ByteArrayOutputStream myOut = new ByteArrayOutputStream();
+
+    // replace stdout and run command
+    System.setOut(new PrintStream(myOut, false, StandardCharsets.UTF_8));
+    FileDump.main(new String[]{testFilePath.toString(), "-j"});
+    System.out.flush();
+    System.setOut(origOut);
+    String[] lines = myOut.toString(StandardCharsets.UTF_8).split("\n");
+    assertEquals("{\"fileName\":\"TestFileDump.testDump.orc\",\"fileVersion\":\"0.12\",\"writerVersion\":\"ORC_14\",\"softwareVersion\":\"ORC Java unknown\",\"numberOfRows\":3,\"compression\":\"ZSTD\",\"compressionBufferSize\":262144,\"schemaString\":\"struct<x:double>\",\"schema\":{\"columnId\":0,\"columnType\":\"STRUCT\",\"children\":{\"x\":{\"columnId\":1,\"columnType\":\"DOUBLE\"}}},\"calendar\":\"Julian/Gregorian\",\"stripeStatistics\":[{\"stripeNumber\":1,\"columnStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}]}],\"fileStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}],\"stripes\":[{\"stripeNumber\":1,\"stripeInformation\":{\"offset\":3,\"indexLength\":55,\"dataLength\":27,\"footerLength\":35,\"rowCount\":3},\"str
 eams\":[{\"columnId\":0,\"section\":\"ROW_INDEX\",\"startOffset\":3,\"length\":11},{\"columnId\":1,\"section\":\"ROW_INDEX\",\"startOffset\":14,\"length\":44},{\"columnId\":1,\"section\":\"DATA\",\"startOffset\":58,\"length\":27}],\"encodings\":[{\"columnId\":0,\"kind\":\"DIRECT\"},{\"columnId\":1,\"kind\":\"DIRECT\"}]}],\"fileLength\":286,\"rawDataSize\":36,\"paddingLength\":0,\"paddingRatio\":0.0,\"status\":\"OK\"}", lines[0]);

Review Comment:
   Thanks @dongjoon-hyun !
   
   Use `TestFileDump.checkOutput` , it can skip some lines without checking.
   
   ```java
     private static final Pattern ignoreTailPattern =
         Pattern.compile("^(?<head>File Version|\"softwareVersion\"): .*");
     private static final Pattern fileSizePattern =
         Pattern.compile("^(\"fileLength\"|File length): (?<size>[0-9]+).*");
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #1770:
URL: https://github.com/apache/orc/pull/1770#issuecomment-1919550807

   Merged to main/2.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #1770:
URL: https://github.com/apache/orc/pull/1770#issuecomment-1925215483

   You can. Feel free to make a backporting PR, @cxzl25 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #1770: ORC-634: Fix the json output for double NaN and infinite
URL: https://github.com/apache/orc/pull/1770


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1770:
URL: https://github.com/apache/orc/pull/1770#discussion_r1477121689


##########
java/tools/src/test/org/apache/orc/tools/TestJsonFileDump.java:
##########
@@ -127,4 +132,38 @@ public void testJsonDump() throws Exception {
 
     TestFileDump.checkOutput(outputFilename, workDir + File.separator + outputFilename);
   }
+
+  @Test
+  public void testDoubleNaNAndInfinite() throws Exception {
+    TypeDescription schema = TypeDescription.fromString("struct<x:double>");
+    Writer writer = OrcFile.createWriter(testFilePath,
+        OrcFile.writerOptions(conf)
+            .fileSystem(fs)
+            .setSchema(schema));
+    VectorizedRowBatch batch = schema.createRowBatch();
+    DoubleColumnVector x = (DoubleColumnVector) batch.cols[0];
+    int row = batch.size++;
+    x.vector[row] = Double.NaN;
+    row = batch.size++;
+    x.vector[row] = Double.POSITIVE_INFINITY;
+    row = batch.size++;
+    x.vector[row] = 12.34D;
+    if (batch.size != 0) {
+      writer.addRowBatch(batch);
+    }
+    writer.close();
+
+    assertEquals(3, writer.getNumberOfRows());
+
+    PrintStream origOut = System.out;
+    ByteArrayOutputStream myOut = new ByteArrayOutputStream();
+
+    // replace stdout and run command
+    System.setOut(new PrintStream(myOut, false, StandardCharsets.UTF_8));
+    FileDump.main(new String[]{testFilePath.toString(), "-j"});
+    System.out.flush();
+    System.setOut(origOut);
+    String[] lines = myOut.toString(StandardCharsets.UTF_8).split("\n");
+    assertEquals("{\"fileName\":\"TestFileDump.testDump.orc\",\"fileVersion\":\"0.12\",\"writerVersion\":\"ORC_14\",\"softwareVersion\":\"ORC Java unknown\",\"numberOfRows\":3,\"compression\":\"ZSTD\",\"compressionBufferSize\":262144,\"schemaString\":\"struct<x:double>\",\"schema\":{\"columnId\":0,\"columnType\":\"STRUCT\",\"children\":{\"x\":{\"columnId\":1,\"columnType\":\"DOUBLE\"}}},\"calendar\":\"Julian/Gregorian\",\"stripeStatistics\":[{\"stripeNumber\":1,\"columnStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}]}],\"fileStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}],\"stripes\":[{\"stripeNumber\":1,\"stripeInformation\":{\"offset\":3,\"indexLength\":55,\"dataLength\":27,\"footerLength\":35,\"rowCount\":3},\"str
 eams\":[{\"columnId\":0,\"section\":\"ROW_INDEX\",\"startOffset\":3,\"length\":11},{\"columnId\":1,\"section\":\"ROW_INDEX\",\"startOffset\":14,\"length\":44},{\"columnId\":1,\"section\":\"DATA\",\"startOffset\":58,\"length\":27}],\"encodings\":[{\"columnId\":0,\"kind\":\"DIRECT\"},{\"columnId\":1,\"kind\":\"DIRECT\"}]}],\"fileLength\":286,\"rawDataSize\":36,\"paddingLength\":0,\"paddingRatio\":0.0,\"status\":\"OK\"}", lines[0]);

Review Comment:
   Ur, this seems to fail in some environment.
   
   ```
   [ERROR] Failures:
   [ERROR]   TestJsonFileDump.testDoubleNaNAndInfinite:167 
   expected: <{"fileName":"TestFileDump.testDump.orc","fileVersion":"0.12","writerVersion":"ORC_14","softwareVersion":"ORC Java unknown","numberOfRows":3,"compression":"ZSTD","compressionBufferSize":262144,"schemaString":"struct<x:double>","schema":{"columnId":0,"columnType":"STRUCT","children":{"x":{"columnId":1,"columnType":"DOUBLE"}}},"calendar":"Julian/Gregorian","stripeStatistics":[{"stripeNumber":1,"columnStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}]}],"fileStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}],"stripes":[{"stripeNumber":1,"stripeInformation":{"offset":3,"indexLength":55,"dataLength":27,"footerLength":35,"rowCount":3},"streams":[{"columnId":0,"section":"ROW_INDEX","startOffset":3,"length":11},{"columnId":1,"section":"ROW_INDEX","startOffset":14,"le
 ngth":44},{"columnId":1,"section":"DATA","startOffset":58,"length":27}],"encodings":[{"columnId":0,"kind":"DIRECT"},{"columnId":1,"kind":"DIRECT"}]}],"fileLength":286,"rawDataSize":36,"paddingLength":0,"paddingRatio":0.0,"status":"OK"}>
   but was: <{"fileName":"TestFileDump.testDump.orc","fileVersion":"0.12","writerVersion":"ORC_14","softwareVersion":"ORC Java 2.1.0-SNAPSHOT","numberOfRows":3,"compression":"ZSTD","compressionBufferSize":262144,"schemaString":"struct<x:double>","schema":{"columnId":0,"columnType":"STRUCT","children":{"x":{"columnId":1,"columnType":"DOUBLE"}}},"calendar":"Julian/Gregorian","stripeStatistics":[{"stripeNumber":1,"columnStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}]}],"fileStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}],"stripes":[{"stripeNumber":1,"stripeInformation":{"offset":3,"indexLength":55,"dataLength":27,"footerLength":35,"rowCount":3},"streams":[{"columnId":0,"section":"ROW_INDEX","startOffset":3,"length":11},{"columnId":1,"section":"ROW_INDEX","startOffset":
 14,"length":44},{"columnId":1,"section":"DATA","startOffset":58,"length":27}],"encodings":[{"columnId":0,"kind":"DIRECT"},{"columnId":1,"kind":"DIRECT"}]}],"fileLength":293,"rawDataSize":36,"paddingLength":0,"paddingRatio":0.0,"status":"OK"}>
   [INFO]
   [ERROR] Tests run: 51, Failures: 1, Errors: 0, Skipped: 0
   ```



##########
java/tools/src/test/org/apache/orc/tools/TestJsonFileDump.java:
##########
@@ -127,4 +132,38 @@ public void testJsonDump() throws Exception {
 
     TestFileDump.checkOutput(outputFilename, workDir + File.separator + outputFilename);
   }
+
+  @Test
+  public void testDoubleNaNAndInfinite() throws Exception {
+    TypeDescription schema = TypeDescription.fromString("struct<x:double>");
+    Writer writer = OrcFile.createWriter(testFilePath,
+        OrcFile.writerOptions(conf)
+            .fileSystem(fs)
+            .setSchema(schema));
+    VectorizedRowBatch batch = schema.createRowBatch();
+    DoubleColumnVector x = (DoubleColumnVector) batch.cols[0];
+    int row = batch.size++;
+    x.vector[row] = Double.NaN;
+    row = batch.size++;
+    x.vector[row] = Double.POSITIVE_INFINITY;
+    row = batch.size++;
+    x.vector[row] = 12.34D;
+    if (batch.size != 0) {
+      writer.addRowBatch(batch);
+    }
+    writer.close();
+
+    assertEquals(3, writer.getNumberOfRows());
+
+    PrintStream origOut = System.out;
+    ByteArrayOutputStream myOut = new ByteArrayOutputStream();
+
+    // replace stdout and run command
+    System.setOut(new PrintStream(myOut, false, StandardCharsets.UTF_8));
+    FileDump.main(new String[]{testFilePath.toString(), "-j"});
+    System.out.flush();
+    System.setOut(origOut);
+    String[] lines = myOut.toString(StandardCharsets.UTF_8).split("\n");
+    assertEquals("{\"fileName\":\"TestFileDump.testDump.orc\",\"fileVersion\":\"0.12\",\"writerVersion\":\"ORC_14\",\"softwareVersion\":\"ORC Java unknown\",\"numberOfRows\":3,\"compression\":\"ZSTD\",\"compressionBufferSize\":262144,\"schemaString\":\"struct<x:double>\",\"schema\":{\"columnId\":0,\"columnType\":\"STRUCT\",\"children\":{\"x\":{\"columnId\":1,\"columnType\":\"DOUBLE\"}}},\"calendar\":\"Julian/Gregorian\",\"stripeStatistics\":[{\"stripeNumber\":1,\"columnStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}]}],\"fileStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}],\"stripes\":[{\"stripeNumber\":1,\"stripeInformation\":{\"offset\":3,\"indexLength\":55,\"dataLength\":27,\"footerLength\":35,\"rowCount\":3},\"str
 eams\":[{\"columnId\":0,\"section\":\"ROW_INDEX\",\"startOffset\":3,\"length\":11},{\"columnId\":1,\"section\":\"ROW_INDEX\",\"startOffset\":14,\"length\":44},{\"columnId\":1,\"section\":\"DATA\",\"startOffset\":58,\"length\":27}],\"encodings\":[{\"columnId\":0,\"kind\":\"DIRECT\"},{\"columnId\":1,\"kind\":\"DIRECT\"}]}],\"fileLength\":286,\"rawDataSize\":36,\"paddingLength\":0,\"paddingRatio\":0.0,\"status\":\"OK\"}", lines[0]);

Review Comment:
   Ur, this seems to fail in some environment.
   
   ```
   [ERROR] Failures:
   [ERROR]   TestJsonFileDump.testDoubleNaNAndInfinite:167 
   expected: <{"fileName":"TestFileDump.testDump.orc","fileVersion":"0.12","writerVersion":"ORC_14","softwareVersion":"ORC Java unknown","numberOfRows":3,"compression":"ZSTD","compressionBufferSize":262144,"schemaString":"struct<x:double>","schema":{"columnId":0,"columnType":"STRUCT","children":{"x":{"columnId":1,"columnType":"DOUBLE"}}},"calendar":"Julian/Gregorian","stripeStatistics":[{"stripeNumber":1,"columnStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}]}],"fileStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}],"stripes":[{"stripeNumber":1,"stripeInformation":{"offset":3,"indexLength":55,"dataLength":27,"footerLength":35,"rowCount":3},"streams":[{"columnId":0,"section":"ROW_INDEX","startOffset":3,"length":11},{"columnId":1,"section":"ROW_INDEX","startOffset":14,"le
 ngth":44},{"columnId":1,"section":"DATA","startOffset":58,"length":27}],"encodings":[{"columnId":0,"kind":"DIRECT"},{"columnId":1,"kind":"DIRECT"}]}],"fileLength":286,"rawDataSize":36,"paddingLength":0,"paddingRatio":0.0,"status":"OK"}>
   but was:  <{"fileName":"TestFileDump.testDump.orc","fileVersion":"0.12","writerVersion":"ORC_14","softwareVersion":"ORC Java 2.1.0-SNAPSHOT","numberOfRows":3,"compression":"ZSTD","compressionBufferSize":262144,"schemaString":"struct<x:double>","schema":{"columnId":0,"columnType":"STRUCT","children":{"x":{"columnId":1,"columnType":"DOUBLE"}}},"calendar":"Julian/Gregorian","stripeStatistics":[{"stripeNumber":1,"columnStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}]}],"fileStatistics":[{"columnId":0,"count":3,"hasNull":false},{"columnId":1,"count":3,"hasNull":false,"bytesOnDisk":27,"min":NaN,"max":NaN,"sum":NaN,"type":"DOUBLE"}],"stripes":[{"stripeNumber":1,"stripeInformation":{"offset":3,"indexLength":55,"dataLength":27,"footerLength":35,"rowCount":3},"streams":[{"columnId":0,"section":"ROW_INDEX","startOffset":3,"length":11},{"columnId":1,"section":"ROW_INDEX","startOffset"
 :14,"length":44},{"columnId":1,"section":"DATA","startOffset":58,"length":27}],"encodings":[{"columnId":0,"kind":"DIRECT"},{"columnId":1,"kind":"DIRECT"}]}],"fileLength":293,"rawDataSize":36,"paddingLength":0,"paddingRatio":0.0,"status":"OK"}>
   [INFO]
   [ERROR] Tests run: 51, Failures: 1, Errors: 0, Skipped: 0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "cxzl25 (via GitHub)" <gi...@apache.org>.
cxzl25 commented on PR #1770:
URL: https://github.com/apache/orc/pull/1770#issuecomment-1917302100

   @dongjoon-hyun Can you help review, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "cxzl25 (via GitHub)" <gi...@apache.org>.
cxzl25 commented on PR #1770:
URL: https://github.com/apache/orc/pull/1770#issuecomment-1920511564

   Because this is a bug, and branch 1.9 is the last version to support JDK8, do I need to submit a PR to port to branch 1.9?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ORC-634: Fix the json output for double NaN and infinite [orc]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1770:
URL: https://github.com/apache/orc/pull/1770#discussion_r1477121881


##########
java/tools/src/test/org/apache/orc/tools/TestJsonFileDump.java:
##########
@@ -127,4 +132,38 @@ public void testJsonDump() throws Exception {
 
     TestFileDump.checkOutput(outputFilename, workDir + File.separator + outputFilename);
   }
+
+  @Test
+  public void testDoubleNaNAndInfinite() throws Exception {
+    TypeDescription schema = TypeDescription.fromString("struct<x:double>");
+    Writer writer = OrcFile.createWriter(testFilePath,
+        OrcFile.writerOptions(conf)
+            .fileSystem(fs)
+            .setSchema(schema));
+    VectorizedRowBatch batch = schema.createRowBatch();
+    DoubleColumnVector x = (DoubleColumnVector) batch.cols[0];
+    int row = batch.size++;
+    x.vector[row] = Double.NaN;
+    row = batch.size++;
+    x.vector[row] = Double.POSITIVE_INFINITY;
+    row = batch.size++;
+    x.vector[row] = 12.34D;
+    if (batch.size != 0) {
+      writer.addRowBatch(batch);
+    }
+    writer.close();
+
+    assertEquals(3, writer.getNumberOfRows());
+
+    PrintStream origOut = System.out;
+    ByteArrayOutputStream myOut = new ByteArrayOutputStream();
+
+    // replace stdout and run command
+    System.setOut(new PrintStream(myOut, false, StandardCharsets.UTF_8));
+    FileDump.main(new String[]{testFilePath.toString(), "-j"});
+    System.out.flush();
+    System.setOut(origOut);
+    String[] lines = myOut.toString(StandardCharsets.UTF_8).split("\n");
+    assertEquals("{\"fileName\":\"TestFileDump.testDump.orc\",\"fileVersion\":\"0.12\",\"writerVersion\":\"ORC_14\",\"softwareVersion\":\"ORC Java unknown\",\"numberOfRows\":3,\"compression\":\"ZSTD\",\"compressionBufferSize\":262144,\"schemaString\":\"struct<x:double>\",\"schema\":{\"columnId\":0,\"columnType\":\"STRUCT\",\"children\":{\"x\":{\"columnId\":1,\"columnType\":\"DOUBLE\"}}},\"calendar\":\"Julian/Gregorian\",\"stripeStatistics\":[{\"stripeNumber\":1,\"columnStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}]}],\"fileStatistics\":[{\"columnId\":0,\"count\":3,\"hasNull\":false},{\"columnId\":1,\"count\":3,\"hasNull\":false,\"bytesOnDisk\":27,\"min\":NaN,\"max\":NaN,\"sum\":NaN,\"type\":\"DOUBLE\"}],\"stripes\":[{\"stripeNumber\":1,\"stripeInformation\":{\"offset\":3,\"indexLength\":55,\"dataLength\":27,\"footerLength\":35,\"rowCount\":3},\"str
 eams\":[{\"columnId\":0,\"section\":\"ROW_INDEX\",\"startOffset\":3,\"length\":11},{\"columnId\":1,\"section\":\"ROW_INDEX\",\"startOffset\":14,\"length\":44},{\"columnId\":1,\"section\":\"DATA\",\"startOffset\":58,\"length\":27}],\"encodings\":[{\"columnId\":0,\"kind\":\"DIRECT\"},{\"columnId\":1,\"kind\":\"DIRECT\"}]}],\"fileLength\":286,\"rawDataSize\":36,\"paddingLength\":0,\"paddingRatio\":0.0,\"status\":\"OK\"}", lines[0]);

Review Comment:
   ```
   "softwareVersion":"ORC Java unknown"
   "softwareVersion":"ORC Java 2.1.0-SNAPSHOT"
   ```
   ```
   "fileLength":286,
   "fileLength":293
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org