You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/20 13:26:36 UTC
[GitHub] [arrow] pitrou opened a new pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
pitrou opened a new pull request #8494:
URL: https://github.com/apache/arrow/pull/8494
This library is 2x to 3x faster for parsing strings to binary floating-point numbers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712849550
Before:
* Parsing raw values:
```
FloatParsing<FloatType> 39763 ns 39758 ns 52350 items_per_second=25.152M/s
FloatParsing<DoubleType> 29812 ns 29808 ns 68765 items_per_second=33.5483M/s
```
* CSV converter performance:
```
FloatConversion 199539 ns 199508 ns 3498 items_per_second=40.0986M/s
```
* Reading a CSV file of floating-point numbers (single-threaded):
```
276 MB (10000000 rows) in 1.543 s. => 179 MB/s.
276 MB (10000000 rows) in 1.481 s. => 186 MB/s.
276 MB (10000000 rows) in 1.492 s. => 185 MB/s.
276 MB (10000000 rows) in 1.469 s. => 188 MB/s.
276 MB (10000000 rows) in 1.471 s. => 188 MB/s.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509391243
##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
public domain, released using the CC0 1.0 Universal dedication (*).
(*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.
Review comment:
Thank you!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712850295
After:
* Parsing raw values:
```
FloatParsing<FloatType> 11936 ns 11934 ns 184367 items_per_second=83.7942M/s
FloatParsing<DoubleType> 10989 ns 10988 ns 194016 items_per_second=91.0101M/s
```
* CSV converter performance:
```
FloatConversion 108180 ns 108166 ns 6442 items_per_second=73.9605M/s
```
* Reading a CSV file of floating-point numbers (single-threaded):
```
276 MB (10000000 rows) in 0.903 s. => 306 MB/s.
276 MB (10000000 rows) in 0.844 s. => 328 MB/s.
276 MB (10000000 rows) in 0.818 s. => 338 MB/s.
276 MB (10000000 rows) in 0.812 s. => 341 MB/s.
276 MB (10000000 rows) in 0.800 s. => 346 MB/s.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712858115
https://issues.apache.org/jira/browse/ARROW-10328
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712899016
Do we need to unvendor double-conversion?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8494:
URL: https://github.com/apache/arrow/pull/8494
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lemire commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
lemire commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509392967
##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
public domain, released using the CC0 1.0 Universal dedication (*).
(*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.
Review comment:
@pitrou For future reference, if I post code publicly on GitHub, you can grab it. I am never going to make a fuss.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509234886
##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
public domain, released using the CC0 1.0 Universal dedication (*).
(*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.
Review comment:
@lemire Is the above blurb acceptable to you, or should we mention a more detailed attribution? Don't hesitate to suggest a wording.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712899431
No, we still use it for the other way round (float to string).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712849550
Before:
* Parsing raw values:
```
FloatParsing<FloatType> 39763 ns 39758 ns 52350 items_per_second=25.152M/s
FloatParsing<DoubleType> 29812 ns 29808 ns 68765 items_per_second=33.5483M/s
```
* CSV converter performance:
```
FloatConversion 199539 ns 199508 ns 3498 items_per_second=40.0986M/s
```
* Reading a CSV file of floating-point numbers (single-threaded):
```
276 MB (10000000 rows) in 1.481 s. => 187 MB/s.
276 MB (10000000 rows) in 1.441 s. => 192 MB/s.
276 MB (10000000 rows) in 1.417 s. => 195 MB/s.
276 MB (10000000 rows) in 1.418 s. => 195 MB/s.
276 MB (10000000 rows) in 1.430 s. => 193 MB/s.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#issuecomment-712850295
After:
* Parsing raw values:
```
FloatParsing<FloatType> 11936 ns 11934 ns 184367 items_per_second=83.7942M/s
FloatParsing<DoubleType> 10989 ns 10988 ns 194016 items_per_second=91.0101M/s
```
* CSV converter performance:
```
FloatConversion 108180 ns 108166 ns 6442 items_per_second=73.9605M/s
```
* Reading a CSV file of floating-point numbers (single-threaded):
```
276 MB (10000000 rows) in 0.928 s. => 298 MB/s.
276 MB (10000000 rows) in 0.940 s. => 294 MB/s.
276 MB (10000000 rows) in 0.918 s. => 301 MB/s.
276 MB (10000000 rows) in 0.896 s. => 309 MB/s.
276 MB (10000000 rows) in 0.900 s. => 307 MB/s.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lemire commented on a change in pull request #8494: ARROW-10328: [C++] Vendor fast_float number parsing library
Posted by GitBox <gi...@apache.org>.
lemire commented on a change in pull request #8494:
URL: https://github.com/apache/arrow/pull/8494#discussion_r509388406
##########
File path: LICENSE.txt
##########
@@ -2223,3 +2223,11 @@ exception of some code pulled in from other repositories (such as
public domain, released using the CC0 1.0 Universal dedication (*).
(*) https://creativecommons.org/publicdomain/zero/1.0/legalcode
+
+--------------------------------------------------------------------------------
+
+The files in cpp/src/arrow/vendored/fast_float/ contain code from
+
+https://github.com/lemire/fast_float
+
+which is made available under the Apache License 2.0.
Review comment:
@pitrou Yes, yes. This is 100% fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org