You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/20 16:40:55 UTC
[GitHub] [arrow] Dandandan opened a new pull request #8723: Like utf8 fast paths
Dandandan opened a new pull request #8723:
URL: https://github.com/apache/arrow/pull/8723
Commonly used patterns '%xxx' 'xxx%' and '%xxx' can use faster methods from Rust standard lib instead.
```
like_utf8 scalar equals time: [828.13 us 830.08 us 832.39 us]
change: [-43.306% -42.962% -42.610%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
like_utf8 scalar ends with
time: [927.93 us 929.31 us 930.88 us]
change: [-59.220% -59.149% -59.082%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
like_utf8 scalar starts with
time: [930.96 us 931.70 us 932.63 us]
change: [-43.537% -43.432% -43.325%] (p = 0.00 < 0.05)
Performance has improved.
```
Also tried fast path for contains (`%xxx%`), but that was slower than using the regex.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731417468
I think it can use some additional tests as I think there is a bug in the current implementation. Will probably have some time tomorrow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan edited a comment on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths
Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731417468
I think it can use some additional tests as I think there is a bug in the current implementation. Will probably have some time tomorrow to finalize the PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #8723: ARROW-10665: [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8723:
URL: https://github.com/apache/arrow/pull/8723
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731567229
Possible further optimization for scalar contains ("%string%") would be to look at a different implementation like this crate https://docs.rs/twoway/0.2.1/twoway/ but not sure if it's worth bringing in, regex is already quite fast for contains.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731290337
Also looks like there is a bug in the existing implementation (missing anchors), maybe that makes regex faster again. Will follow up later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731566968
@jorgecarleitao
Added test suites and also fixed existing bugs in like / nlike, also the non scalar ones. Before patterns like "arr", "%arr" "arr%", "arr_", etc were matching too much because of the missing anchors in the regex.
Added benches / fast path to nlike as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731289205
https://issues.apache.org/jira/browse/ARROW-10665
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8723: Like utf8 scalar fast paths
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731282668
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Thanks for opening a pull request!
Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW
Then could you also rename pull request title in the following format?
ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
See also:
* [Other pull requests](https://github.com/apache/arrow/pulls/)
* [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org