You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/18 13:52:40 UTC

[GitHub] [arrow-rs] alamb opened a new issue #696: Add documentation examples for `regexp_match` kernels

alamb opened a new issue #696:
URL: https://github.com/apache/arrow-rs/issues/696


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   The current documentation for the `regexp_match` kernel is quite sparse:
   https://docs.rs/arrow/5.2.0/arrow/compute/kernels/regexp/fn.regexp_match.html
   
   ![Screen Shot 2021-08-18 at 9 50 12 AM](https://user-images.githubusercontent.com/490673/129910047-fbcb7f02-b798-4cfb-ae11-02770f160b39.png)
   
   Also, since this function returns `ArrayRef` (aka untyped generic Array), I was personally a bit confused that `regexp_match` actually returns a `ListArray` with the matches. Having this documented would help a lot. 
   
   **Describe the solution you'd like**
   1. A description of the type of the array that is returned
   2. A doc example (can probably adapt a test in the code) that shows how to invoke `regexp_match` and use the returned `ListArray`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] matthewmturner commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-912904126


   I can give this a shot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] matthewmturner commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-916964428


   @alamb unfortunately not, same result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] seddonm1 commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-917497090


   Thanks @matthewmturner i will have a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] matthewmturner edited a comment on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
matthewmturner edited a comment on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-913013809


   @alamb im playing around with the `regexp_match` function and struggling to produce the expected results.  I tried to do something similar to the test `match_single_group`.  
   
   I have the following:
   ```
   let array = StringArray::from(vec![Some("Animal"), None, Some("House")]);
   let pat = StringArray::from(vec![r"^[A]"; 3]);
   let m = regexp_match(&array, &pat, None).unwrap();
   let result = m.as_any().downcast_ref::<ListArray>().unwrap();
   
   let elem_builder: GenericStringBuilder<i32> = GenericStringBuilder::new(0);
   let mut expected_builder = ListBuilder::new(elem_builder);
   expected_builder.values().append_value("A").unwrap();
   expected_builder.append(true).unwrap();
   expected_builder.append(false).unwrap();
   expected_builder.append(false).unwrap();
   let expected = expected_builder.finish();
   assert_eq!(&expected, result);
   ```
   
   But get the following:
   ```
   thread 'main' panicked at 'assertion failed: `(left == right)`
     left: `ListArray
   [
     StringArray
   [
     "A",
   ],
     null,
     null,
   ]`,
    right: `ListArray
   [
     StringArray
   [
   ],
     null,
     null,
   ]`', src/main.rs:54:5
   ```
   It's not clear to me why the result StringArray at the first index of the ListArray doesnt have "A" - its just an empty StringArray.
   
   Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-916991654


   🤔  I don't have a good answer then. It would require some more debugging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-916953170


   @matthewmturner I also don't understand that -- if you remove the caret and use `[A]` as the regex does it do the right thing? Perhaps @seddonm1  has some hints as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] matthewmturner commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-917427288


   @seddonm1 i believe postgres works as expected. i did the below.
   
   ```
   postgres=# WITH vals(a) AS (VALUES ('Animal'), (null),  ('House')) 
   SELECT a, regexp_matches(a, '^[A]') m from vals;
   ```
   
   Resulting in
   ```
      a    |  m  
   --------+-----
    Animal | {A}
   (1 row)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] matthewmturner commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-913013809


   @alamb im playing around with the regexp_match function and struggling to produce the expected results.  I tried to do something similar to the test `match_single_group`.  
   
   I have the following:
   ```
   let array = StringArray::from(vec![Some("Animal"), None, Some("House")]);
   let pat = StringArray::from(vec![r"^[A]"; 3]);
   let m = regexp_match(&array, &pat, None).unwrap();
   let result = m.as_any().downcast_ref::<ListArray>().unwrap();
   
   let elem_builder: GenericStringBuilder<i32> = GenericStringBuilder::new(0);
   let mut expected_builder = ListBuilder::new(elem_builder);
   expected_builder.values().append_value("A").unwrap();
   expected_builder.append(true).unwrap();
   expected_builder.append(false).unwrap();
   expected_builder.append(false).unwrap();
   let expected = expected_builder.finish();
   assert_eq!(&expected, result);
   ```
   
   But get the following:
   ```
   thread 'main' panicked at 'assertion failed: `(left == right)`
     left: `ListArray
   [
     StringArray
   [
     "A",
   ],
     null,
     null,
   ]`,
    right: `ListArray
   [
     StringArray
   [
   ],
     null,
     null,
   ]`', src/main.rs:54:5
   ```
   It's not clear to me why the result StringArray at the first index of the ListArray doesnt have "A" - its just an empty StringArray.
   
   Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] seddonm1 commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-917309048


   @alamb Sorry I didn't do this one. It does appear there may be a defect.
   
   @matthewmturner are you able to reproduce the behaviour to verify that it behaves differently to Postgres? (https://www.postgresql.org/docs/current/functions-matching.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] matthewmturner commented on issue #696: Add documentation examples for `regexp_match` kernels

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #696:
URL: https://github.com/apache/arrow-rs/issues/696#issuecomment-917324963


   @seddonm1 sure, will give it a shot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org