You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/20 16:40:55 UTC

[GitHub] [arrow] Dandandan opened a new pull request #8723: Like utf8 fast paths

Dandandan opened a new pull request #8723:
URL: https://github.com/apache/arrow/pull/8723


   Commonly used patterns '%xxx'  'xxx%' and '%xxx' can use faster methods from Rust standard lib instead.
   
   ```
   like_utf8 scalar equals time:   [828.13 us 830.08 us 832.39 us]                                    
                           change: [-43.306% -42.962% -42.610%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
   
   like_utf8 scalar ends with                                                                            
                           time:   [927.93 us 929.31 us 930.88 us]
                           change: [-59.220% -59.149% -59.082%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   like_utf8 scalar starts with                                                                            
                           time:   [930.96 us 931.70 us 932.63 us]
                           change: [-43.537% -43.432% -43.325%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```
   
   Also tried fast path for contains (`%xxx%`), but that was slower than using the regex.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731417468


   I think it can use some additional tests as I think there is a bug in the current implementation. Will probably have some time tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731417468


   I think it can use some additional tests as I think there is a bug in the current implementation. Will probably have some time tomorrow to finalize the PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8723: ARROW-10665: [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8723:
URL: https://github.com/apache/arrow/pull/8723


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731567229


   Possible further optimization for scalar contains ("%string%") would be to look at a different implementation like this crate https://docs.rs/twoway/0.2.1/twoway/ but not sure if it's worth bringing in, regex is already quite fast for contains.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731290337


   Also looks like there is a bug in the existing implementation (missing anchors), maybe that makes regex faster again. Will follow up later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731566968


   @jorgecarleitao
   
   Added test suites and also fixed existing bugs in like / nlike, also the non scalar ones. Before patterns like "arr", "%arr" "arr%", "arr_", etc were matching too much because of the missing anchors in the regex.
   
   Added benches / fast path to nlike as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8723: ARROW-10665: [Rust] like utf8 scalar fast paths

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731289205


   https://issues.apache.org/jira/browse/ARROW-10665


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8723: Like utf8 scalar fast paths

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8723:
URL: https://github.com/apache/arrow/pull/8723#issuecomment-731282668


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org