You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/20 13:26:36 UTC

[GitHub] [arrow] pitrou opened a new pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

pitrou opened a new pull request #8494:
URL: https://github.com/apache/arrow/pull/8494


   This library is 2x to 3x faster for parsing strings to binary floating-point numbers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712849550


   Before:
   * Parsing raw values:
   ```
   FloatParsing<FloatType>          39763 ns        39758 ns        52350 items_per_second=25.152M/s
   FloatParsing<DoubleType>         29812 ns        29808 ns        68765 items_per_second=33.5483M/s
   ```
   * CSV converter performance:
   ```
   FloatConversion                 199539 ns       199508 ns         3498 items_per_second=40.0986M/s
   ```
   * Reading a CSV file of floating-point numbers (single-threaded):
   ```
   276 MB (10000000 rows) in 1.543 s. => 179 MB/s.
   276 MB (10000000 rows) in 1.481 s. => 186 MB/s.
   276 MB (10000000 rows) in 1.492 s. => 185 MB/s.
   276 MB (10000000 rows) in 1.469 s. => 188 MB/s.
   276 MB (10000000 rows) in 1.471 s. => 188 MB/s.
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509391243



##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
 public domain, released using the CC0 1.0 Universal dedication (*).
 
 (*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.

Review comment:
       Thank you!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712850295


   After:
   * Parsing raw values:
   ```
   FloatParsing<FloatType>          11936 ns        11934 ns       184367 items_per_second=83.7942M/s
   FloatParsing<DoubleType>         10989 ns        10988 ns       194016 items_per_second=91.0101M/s
   ```
   * CSV converter performance:
   ```
   FloatConversion                 108180 ns       108166 ns         6442 items_per_second=73.9605M/s
   ```
   * Reading a CSV file of floating-point numbers (single-threaded):
   ```
   276 MB (10000000 rows) in 0.903 s. => 306 MB/s.
   276 MB (10000000 rows) in 0.844 s. => 328 MB/s.
   276 MB (10000000 rows) in 0.818 s. => 338 MB/s.
   276 MB (10000000 rows) in 0.812 s. => 341 MB/s.
   276 MB (10000000 rows) in 0.800 s. => 346 MB/s.
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712858115


   https://issues.apache.org/jira/browse/ARROW-10328


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712899016


   Do we need to unvendor double-conversion?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8494:
URL: https://github.com/apache/arrow/pull/8494


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lemire commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
lemire commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509392967



##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
 public domain, released using the CC0 1.0 Universal dedication (*).
 
 (*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.

Review comment:
       @pitrou For future reference, if I post code publicly on GitHub, you can grab it. I am never going to make a fuss.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509234886



##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
 public domain, released using the CC0 1.0 Universal dedication (*).
 
 (*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.

Review comment:
       @lemire Is the above blurb acceptable to you, or should we mention a more detailed attribution? Don't hesitate to suggest a wording.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712899431


   No, we still use it for the other way round (float to string).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712849550


   Before:
   * Parsing raw values:
   ```
   FloatParsing<FloatType>          39763 ns        39758 ns        52350 items_per_second=25.152M/s
   FloatParsing<DoubleType>         29812 ns        29808 ns        68765 items_per_second=33.5483M/s
   ```
   * CSV converter performance:
   ```
   FloatConversion                 199539 ns       199508 ns         3498 items_per_second=40.0986M/s
   ```
   * Reading a CSV file of floating-point numbers (single-threaded):
   ```
   276 MB (10000000 rows) in 1.481 s. => 187 MB/s.
   276 MB (10000000 rows) in 1.441 s. => 192 MB/s.
   276 MB (10000000 rows) in 1.417 s. => 195 MB/s.
   276 MB (10000000 rows) in 1.418 s. => 195 MB/s.
   276 MB (10000000 rows) in 1.430 s. => 193 MB/s.
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712850295


   After:
   * Parsing raw values:
   ```
   FloatParsing<FloatType>          11936 ns        11934 ns       184367 items_per_second=83.7942M/s
   FloatParsing<DoubleType>         10989 ns        10988 ns       194016 items_per_second=91.0101M/s
   ```
   * CSV converter performance:
   ```
   FloatConversion                 108180 ns       108166 ns         6442 items_per_second=73.9605M/s
   ```
   * Reading a CSV file of floating-point numbers (single-threaded):
   ```
   276 MB (10000000 rows) in 0.928 s. => 298 MB/s.
   276 MB (10000000 rows) in 0.940 s. => 294 MB/s.
   276 MB (10000000 rows) in 0.918 s. => 301 MB/s.
   276 MB (10000000 rows) in 0.896 s. => 309 MB/s.
   276 MB (10000000 rows) in 0.900 s. => 307 MB/s.
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lemire commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library

Posted by GitBox <gi...@apache.org>.
lemire commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509388406



##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
 public domain, released using the CC0 1.0 Universal dedication (*).
 
 (*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.

Review comment:
       @pitrou Yes, yes. This is 100% fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org