You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/24 23:48:42 UTC

[GitHub] [hudi] alexeykudinkin opened a new pull request, #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

alexeykudinkin opened a new pull request, #7052:
URL: https://github.com/apache/hudi/pull/7052

   ### Change Logs
   
   Make sure Dictionary Encoding in Parquet enabled by default. There's no reason to eliminate Dictionary Encoding from consideration when writing Parquet, therefore flipping this flag to make sure we do it by default.
   
   ### Impact
   
   Low
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1293136355

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d2baaf100be34462690451554c4687053510f0e0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1314574904

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010",
       "triggerID" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8dec43e53b0714c3459ba887ffbee272694d0a1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13013",
       "triggerID" : "b8dec43e53b0714c3459ba887ffbee272694d0a1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c082143b4c9dcc9a0df9dcb5dc54db33e42112ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010) 
   * b8dec43e53b0714c3459ba887ffbee272694d0a1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13013) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1292853249

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2e5cc087542cd1ae853b3fb477cc9358e7bde36a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545) 
   * d2baaf100be34462690451554c4687053510f0e0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1292850313

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2e5cc087542cd1ae853b3fb477cc9358e7bde36a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545) 
   * d2baaf100be34462690451554c4687053510f0e0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1314301888

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 357b799343ac222159a248327fe637a0a3bd024a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966) 
   * c082143b4c9dcc9a0df9dcb5dc54db33e42112ce UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [MINOR] Make sure Dictionary Encoding in Parquet enabled by default [hudi]

Posted by "ThinkerLei (via GitHub)" <gi...@apache.org>.
ThinkerLei commented on code in PR #7052:
URL: https://github.com/apache/hudi/pull/7052#discussion_r1507091359


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -2347,7 +2347,7 @@ public void testDataBlockFormatAppendAndReadWithProjectedSchema(
           new HashMap<HoodieLogBlockType, Integer>() {{
             put(HoodieLogBlockType.AVRO_DATA_BLOCK, 0); // not supported
             put(HoodieLogBlockType.HFILE_DATA_BLOCK, 0); // not supported
-            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 2593 : 2605);
+            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 1802 : 1809);

Review Comment:
   Hi @alexeykudinkin, I found that testDataBlockFormatAppendAndReadWithProjectedSchema will fail when we set ${avro.version} = 1.8.2 . The expectedReadBytes expected is 1809 but real read bytes is  1802. Is ${avro.version} must be greater than 1.9? Thank you very much if I can receive your reply.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1314310340

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010",
       "triggerID" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 357b799343ac222159a248327fe637a0a3bd024a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966) 
   * c082143b4c9dcc9a0df9dcb5dc54db33e42112ce Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin merged pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
alexeykudinkin merged PR #7052:
URL: https://github.com/apache/hudi/pull/7052


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1314570376

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010",
       "triggerID" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8dec43e53b0714c3459ba887ffbee272694d0a1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b8dec43e53b0714c3459ba887ffbee272694d0a1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c082143b4c9dcc9a0df9dcb5dc54db33e42112ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010) 
   * b8dec43e53b0714c3459ba887ffbee272694d0a1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1312368851

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 357b799343ac222159a248327fe637a0a3bd024a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [MINOR] Make sure Dictionary Encoding in Parquet enabled by default [hudi]

Posted by "ThinkerLei (via GitHub)" <gi...@apache.org>.
ThinkerLei commented on code in PR #7052:
URL: https://github.com/apache/hudi/pull/7052#discussion_r1507091359


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -2347,7 +2347,7 @@ public void testDataBlockFormatAppendAndReadWithProjectedSchema(
           new HashMap<HoodieLogBlockType, Integer>() {{
             put(HoodieLogBlockType.AVRO_DATA_BLOCK, 0); // not supported
             put(HoodieLogBlockType.HFILE_DATA_BLOCK, 0); // not supported
-            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 2593 : 2605);
+            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 1802 : 1809);

Review Comment:
   Hi @alexeykudinkin, I found that testDataBlockFormatAppendAndReadWithProjectedSchema will fail when we set ${avro.version} = 1.8.2 and jdk version is 8. The expectedReadBytes expected is 1809 but real read bytes is  1802. Is ${avro.version} must be greater than 1.9? Thank you very much if I can receive your reply. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [MINOR] Make sure Dictionary Encoding in Parquet enabled by default [hudi]

Posted by "ThinkerLei (via GitHub)" <gi...@apache.org>.
ThinkerLei commented on code in PR #7052:
URL: https://github.com/apache/hudi/pull/7052#discussion_r1507091359


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -2347,7 +2347,7 @@ public void testDataBlockFormatAppendAndReadWithProjectedSchema(
           new HashMap<HoodieLogBlockType, Integer>() {{
             put(HoodieLogBlockType.AVRO_DATA_BLOCK, 0); // not supported
             put(HoodieLogBlockType.HFILE_DATA_BLOCK, 0); // not supported
-            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 2593 : 2605);
+            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 1802 : 1809);

Review Comment:
   Hi @alexeykudinkin, I found that testDataBlockFormatAppendAndReadWithProjectedSchema will fail when we set ${avro.version} = 1.8.2 and jdk version is 8. The expectedReadBytes expected is 1809 but real read bytes is  1802. Is ${avro.version} must be greater than 1.9? Thank you very much if I can receive your reply. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #7052:
URL: https://github.com/apache/hudi/pull/7052#discussion_r1020584493


##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetConfig.java:
##########
@@ -37,7 +37,7 @@
 
   public HoodieParquetConfig(T writeSupport, CompressionCodecName compressionCodecName, int blockSize,
                              int pageSize, long maxFileSize, Configuration hadoopConf, double compressionRatio) {
-    this(writeSupport, compressionCodecName, blockSize, pageSize, maxFileSize, hadoopConf, compressionRatio, false);
+    this(writeSupport, compressionCodecName, blockSize, pageSize, maxFileSize, hadoopConf, compressionRatio, true);

Review Comment:
   Discussed offline: there's no reason for this value not to be configured. Will follow-up and rebase all users of this ctor to instead rely on the configured value 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1314436732

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010",
       "triggerID" : "c082143b4c9dcc9a0df9dcb5dc54db33e42112ce",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c082143b4c9dcc9a0df9dcb5dc54db33e42112ce Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13010) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [MINOR] Make sure Dictionary Encoding in Parquet enabled by default [hudi]

Posted by "ThinkerLei (via GitHub)" <gi...@apache.org>.
ThinkerLei commented on code in PR #7052:
URL: https://github.com/apache/hudi/pull/7052#discussion_r1508476897


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -2347,7 +2347,7 @@ public void testDataBlockFormatAppendAndReadWithProjectedSchema(
           new HashMap<HoodieLogBlockType, Integer>() {{
             put(HoodieLogBlockType.AVRO_DATA_BLOCK, 0); // not supported
             put(HoodieLogBlockType.HFILE_DATA_BLOCK, 0); // not supported
-            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 2593 : 2605);
+            put(HoodieLogBlockType.PARQUET_DATA_BLOCK, HoodieAvroUtils.gteqAvro1_9() ? 1802 : 1809);

Review Comment:
   Why we use avro version to get the num bytes we read? If we change parquet version , can we still judge it in this way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1289806277

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2e5cc087542cd1ae853b3fb477cc9358e7bde36a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #7052:
URL: https://github.com/apache/hudi/pull/7052#discussion_r1007209772


##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetConfig.java:
##########
@@ -37,7 +37,7 @@
 
   public HoodieParquetConfig(T writeSupport, CompressionCodecName compressionCodecName, int blockSize,
                              int pageSize, long maxFileSize, Configuration hadoopConf, double compressionRatio) {
-    this(writeSupport, compressionCodecName, blockSize, pageSize, maxFileSize, hadoopConf, compressionRatio, false);
+    this(writeSupport, compressionCodecName, blockSize, pageSize, maxFileSize, hadoopConf, compressionRatio, true);

Review Comment:
   can we do HoodieStorageConfig.PARQUET_DICTIONARY_ENABLED.defaultValue 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1314663017

   CI is green:
   <img width="1611" alt="Screenshot 2022-11-14 at 6 11 42 PM" src="https://user-images.githubusercontent.com/428277/201809488-731aa0b4-3b0f-4222-aedb-47f65cb387fa.png">
   
   https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=13013&view=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1289802334

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2e5cc087542cd1ae853b3fb477cc9358e7bde36a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1312240156

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d2baaf100be34462690451554c4687053510f0e0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609) 
   * 357b799343ac222159a248327fe637a0a3bd024a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12966) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1312237517

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d2baaf100be34462690451554c4687053510f0e0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609",
       "triggerID" : "d2baaf100be34462690451554c4687053510f0e0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "357b799343ac222159a248327fe637a0a3bd024a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "357b799343ac222159a248327fe637a0a3bd024a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d2baaf100be34462690451554c4687053510f0e0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12609) 
   * 357b799343ac222159a248327fe637a0a3bd024a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7052: [MINOR] Make sure Dictionary Encoding in Parquet enabled by default

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7052:
URL: https://github.com/apache/hudi/pull/7052#issuecomment-1290231086

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545",
       "triggerID" : "2e5cc087542cd1ae853b3fb477cc9358e7bde36a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2e5cc087542cd1ae853b3fb477cc9358e7bde36a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org