You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/08/16 03:40:41 UTC

[GitHub] [orc] guiyanakuang opened a new pull request #869: ORC-946: Unified json library

guiyanakuang opened a new pull request #869:
URL: https://github.com/apache/orc/pull/869


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. File a JIRA issue first and use it as a prefix of your PR title, e.g., `ORC-001: Fix ABC`.
     2. Use your PR title to summarize what this PR proposes instead of describing the problem.
     3. Make PR title and description complete because these will be the permanent commit log.
     4. If possible, provide a concise and reproducible example to reproduce the issue for a faster review.
     5. If the PR is unfinished, use GitHub PR Draft feature.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If there is a discussion in the mailing list, please add the link.
   -->
   The java project depends on several json libraries. Replacing jackson/jettison with Gson 
   
   ```
   use jackson-core in orc-benchmarks-core
   org.apache.orc.bench.core.convert.json.JsonWriter
   
   use jettison in tools
   org.apache.orc.tools.KeyTool
   org.apache.orc.tools.JsonFileDump
   org.apache.orc.tools.PrintData
   ```
   
   gson vs. jettison has some inconsistent performance
   
   1.  the scope of character escaping is inconsistent; 
       jettison escapes '/'
       gson does not modify
   2. Inconsistent floating point writing; 
       jettison removes trailing zeros and decimal points where possible. 
       gson leaves it as is.
   3. prettyPrint;
       jettison's prettyPrint is faulty, '[' and '{' are often linked together without a line break. 
       gson prettyPrint is fine.
   
   These three behaviours are not forward compatible in this pr.
   
   I don't think prettyPrint needs to be made compatible. The other two I would like to hear your opinion on.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Reduce dependencies and use a unified json lib.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Pass the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on pull request #869: ORC-946: Unified json library

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on pull request #869:
URL: https://github.com/apache/orc/pull/869#issuecomment-899372340


   Thanks to dongjoon-hyun for the review and approval!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #869: ORC-946: Unified json library

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #869:
URL: https://github.com/apache/orc/pull/869


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang edited a comment on pull request #869: ORC-946: Unified json library

Posted by GitBox <gi...@apache.org>.
guiyanakuang edited a comment on pull request #869:
URL: https://github.com/apache/orc/pull/869#issuecomment-899372340


   Thanks to @dongjoon-hyun for the review and approval!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #869: ORC-946: Unified json library

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #869:
URL: https://github.com/apache/orc/pull/869#discussion_r689303859



##########
File path: java/tools/src/test/resources/orc-file-dump.json
##########
@@ -28,7 +28,7 @@
       }
     }
   },
-  "calendar": "Julian\/Gregorian",
+  "calendar": "Julian/Gregorian",

Review comment:
       This one looks good too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #869: ORC-946: Unified json library

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #869:
URL: https://github.com/apache/orc/pull/869#issuecomment-899288368


   Thank you for making a PR, @guiyanakuang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #869: ORC-946: Unified json library

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #869:
URL: https://github.com/apache/orc/pull/869#discussion_r689303611



##########
File path: java/tools/src/test/resources/orc-file-dump.json
##########
@@ -1323,46 +1331,52 @@
           "dictionarySize": 35
         }
       ],
-      "indexes": [{
-        "columnId": 3,
-        "rowGroupIndexes": [{
-          "entryId": 0,
-          "count": 990,
-          "hasNull": true,
-          "min": "Darkness,",
-          "max": "worst",
-          "totalLength": 3963,
-          "type": "STRING",
-          "positions": [
-            0,
-            0,
-            0,
-            0,
-            0,
-            0,
-            0
-          ]
-        }],
-        "bloomFilterIndexes": [{
-          "entryId": 0,
-          "numHashFunctions": 4,
-          "bitCount": 6272,
-          "popCount": 138,
-          "loadFactor": 0.022002551704645157,
-          "expectedFpp": 2.3436470542037569E-7
-        }],
-        "stripeLevelBloomFilter": {
-          "numHashFunctions": 4,
-          "bitCount": 6272,
-          "popCount": 138,
-          "loadFactor": 0.022002551704645157,
-          "expectedFpp": 2.3436470542037569E-7
+      "indexes": [
+        {
+          "columnId": 3,
+          "rowGroupIndexes": [
+            {
+              "entryId": 0,
+              "count": 990,
+              "hasNull": true,
+              "min": "Darkness,",
+              "max": "worst",
+              "totalLength": 3963,
+              "type": "STRING",
+              "positions": [
+                0,
+                0,
+                0,
+                0,
+                0,
+                0,
+                0
+              ]
+            }
+          ],
+          "bloomFilterIndexes": [
+            {
+              "entryId": 0,
+              "numHashFunctions": 4,
+              "bitCount": 6272,
+              "popCount": 138,
+              "loadFactor": 0.022002551704645157,
+              "expectedFpp": 2.3436470542037569E-7
+            }
+          ],
+          "stripeLevelBloomFilter": {
+            "numHashFunctions": 4,
+            "bitCount": 6272,
+            "popCount": 138,
+            "loadFactor": 0.022002551704645157,
+            "expectedFpp": 2.3436470542037569E-7
+          }
         }
-      }]
+      ]
     }
   ],
   "fileLength": 272513,
   "paddingLength": 0,
-  "paddingRatio": 0,
+  "paddingRatio": 0.0,

Review comment:
       I agree that new floating point string looks better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org