You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/04 02:51:12 UTC

[GitHub] [arrow-rs] liukun4515 opened a new issue, #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

liukun4515 opened a new issue, #2313:
URL: https://github.com/apache/arrow-rs/issues/2313

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   When we use the ballista/datafusion to execute some query( the table has may decimal data type), and perf the cpu.
   
   The validation of decimal cost a lot of cpu about 5% from below picture.
   
   ![image](https://user-images.githubusercontent.com/7450163/182747706-071123a8-8dac-4f52-a99f-e7ef8c081bd2.png)
   
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] liukun4515 commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1205144179

   Decimal also can be deserialized from INT32 or INT64 type of parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] liukun4515 commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1204700281

   In the arrow-rs, there are many places to generate the decimal array.
   If the precision/range of the target decimal is larger than the source decimal/data value, we don't need to validation the generated decimal array.
   
   For example, Reading the decimal(n,0) from parquet int64 column and the n is greater equal to 18, we don't need to verify the result of the decimal array, because the value from int64 will not be overflow the target precision.
   
   From above the method, we use less cpu for decimal data type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] liukun4515 commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1205141622

   > Also potentially related to this, the way decimal data is currently read from parquet is hopelessly inefficient #2318. I keep meaning to fix it, but I haven't got to it yet. Perhaps I can find some time this weekend... 🤔
   
   Your optimization will improve the performance of reading decimal data.
   This issue may reduce unnecessary validation for generated decimal array.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] liukun4515 commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1222046219

   - [ ] add no validation in the decimal array, optimize performance of the cast and other operation 
   - [ ] optimize the performance of reading data from parquet file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1339110219

   We no longer perform validation except where explicitly opted in to


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1209274440

   This PR https://github.com/apache/arrow-rs/pull/2383 may have improved performance as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array
URL: https://github.com/apache/arrow-rs/issues/2313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #2313: optimize decimal: reduce validation when construct the decimal array or cast to the decimal array

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2313:
URL: https://github.com/apache/arrow-rs/issues/2313#issuecomment-1204957176

   Also potentially related to this, the way decimal data is currently read from parquet is hopelessly inefficient #2318. I keep meaning to fix it, but I haven't got to it yet. Perhaps I can find some time this weekend... :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org