You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Adam Hooper (Jira)" <ji...@apache.org> on 2021/05/06 17:40:00 UTC

[jira] [Updated] (ARROW-12670) extract_regex gives bizarre behavior after nulls or non-matches

     [ https://issues.apache.org/jira/browse/ARROW-12670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Hooper updated ARROW-12670:
--------------------------------
    Description: 
After a non-match, the *subsequent* string matches ... but with no data.

{code}
>>> pa.compute.extract_regex(pa.array(["a", "b", "c", "d"]), pattern="(?P<x>[^b])")
<pyarrow.lib.StructArray object at 0x7f80de918ee0>
-- is_valid:
  [
    true,
    false,
    true,
    true
  ]
-- child 0 type: string
  [
    "a",
    "",
    "",
    "c"
  ]
{code}

  was:
After a non-match, the *subsequent* string never matches.

{code}
>>> pa.compute.extract_regex(pa.array(["a", "b", "c", "d"]), pattern="(?P<x>[^b])")
<pyarrow.lib.StructArray object at 0x7f80de956640>
-- is_valid:
  [
    true,
    false,
    true,
    true
  ]
-- child 0 type: string
  [
    "a",
    "",
    "",
    "a"
  ]
{code}


> extract_regex gives bizarre behavior after nulls or non-matches
> ---------------------------------------------------------------
>
>                 Key: ARROW-12670
>                 URL: https://issues.apache.org/jira/browse/ARROW-12670
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 4.0.0
>            Reporter: Adam Hooper
>            Priority: Major
>
> After a non-match, the *subsequent* string matches ... but with no data.
> {code}
> >>> pa.compute.extract_regex(pa.array(["a", "b", "c", "d"]), pattern="(?P<x>[^b])")
> <pyarrow.lib.StructArray object at 0x7f80de918ee0>
> -- is_valid:
>   [
>     true,
>     false,
>     true,
>     true
>   ]
> -- child 0 type: string
>   [
>     "a",
>     "",
>     "",
>     "c"
>   ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)