You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/14 18:06:50 UTC

[GitHub] [arrow] pitrou opened a new pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

pitrou opened a new pull request #11955:
URL: https://github.com/apache/arrow/pull/11955


   
   
   This makes compute functions easier to use, for example here the required "pattern" option doesn't need to be passed by name:
   ```
   >>> pc.split_pattern("abacab", "a")
   <pyarrow.ListScalar: ['', 'b', 'c', 'b']>
   ```
   
   ... and producing the following doc at the prompt:
   ```
   split_pattern(strings, /, pattern, *, max_splits=-1, reverse=False, options=None, memory_pool=None)
       Split string according to separator.
   
       Split each string according to the exact `pattern` defined in
       SplitPatternOptions.  The output for each string input is a list
       of strings.
   
       The maximum number of splits and direction of splitting
       (forward, reverse) can optionally be defined in SplitPatternOptions.
   
       Parameters
       ----------
       strings : Array-like or scalar-like
           Argument to compute function
       pattern : optional
           Parameter for SplitPatternOptions constructor. Either `options`
           or `pattern` can be passed, but not both at the same time.
       max_splits : optional
           Parameter for SplitPatternOptions constructor. Either `options`
           or `max_splits` can be passed, but not both at the same time.
       reverse : optional
           Parameter for SplitPatternOptions constructor. Either `options`
           or `reverse` can be passed, but not both at the same time.
       options : pyarrow.compute.SplitPatternOptions, optional
           Parameters altering compute function semantics.
       memory_pool : pyarrow.MemoryPool, optional
           If not passed, will allocate memory from the default memory pool.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#issuecomment-996086944


   Benchmark runs are scheduled for baseline = 8a4d8127aae80b27afb35755d44a8b61d770a706 and contender = 81c8a0e06bf71f82d0ef8350776f3440672b90e9. 81c8a0e06bf71f82d0ef8350776f3440672b90e9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/db504aa871de4ed9bdc9751ec5439404...64f00da8d1664fc9bb64f98385f2d209/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/973318c443e448bbafd32813dfd5e35c...5ed02eb2a7b1424286a93378b12b5315/)
   [Finished :arrow_down:0.8% :arrow_up:0.13%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6421379fa9164cecaaa30b54663929ce...dd7d9bc3782946cf955c14a8eb0361c2/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#issuecomment-996086944


   Benchmark runs are scheduled for baseline = 8a4d8127aae80b27afb35755d44a8b61d770a706 and contender = 81c8a0e06bf71f82d0ef8350776f3440672b90e9. 81c8a0e06bf71f82d0ef8350776f3440672b90e9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/db504aa871de4ed9bdc9751ec5439404...64f00da8d1664fc9bb64f98385f2d209/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/973318c443e448bbafd32813dfd5e35c...5ed02eb2a7b1424286a93378b12b5315/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6421379fa9164cecaaa30b54663929ce...dd7d9bc3782946cf955c14a8eb0361c2/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770466860



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1854,6 +1896,11 @@ def test_count():
     assert pc.count(arr, mode='only_valid').as_py() == 3
     assert pc.count(arr, mode='only_null').as_py() == 2
     assert pc.count(arr, mode='all').as_py() == 5
+    assert pc.count(arr, 'all').as_py() == 5

Review comment:
       Yes, allowing positional arguments is always a balancing act...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770779663



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>
-    """
-    options = ModeOptions(n, skip_nulls=skip_nulls, min_count=min_count)
-    return call_function("mode", [array], options)
-
-
-def filter(data, mask, null_selection_behavior='drop'):

Review comment:
       Let's maybe just change this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #11955:
URL: https://github.com/apache/arrow/pull/11955


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770632330



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>

Review comment:
       Yes, something like that is what I had in mind. And then we can add those automatically to the generated docstrings. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770304217



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1854,6 +1896,11 @@ def test_count():
     assert pc.count(arr, mode='only_valid').as_py() == 3
     assert pc.count(arr, mode='only_null').as_py() == 2
     assert pc.count(arr, mode='all').as_py() == 5
+    assert pc.count(arr, 'all').as_py() == 5

Review comment:
       Actually, it's not that "mode" is that a clarifying keyword name either .., so ignore my comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#issuecomment-996086944


   Benchmark runs are scheduled for baseline = 8a4d8127aae80b27afb35755d44a8b61d770a706 and contender = 81c8a0e06bf71f82d0ef8350776f3440672b90e9. 81c8a0e06bf71f82d0ef8350776f3440672b90e9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/db504aa871de4ed9bdc9751ec5439404...64f00da8d1664fc9bb64f98385f2d209/)
   [Finished :arrow_down:0.9% :arrow_up:0.9%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/973318c443e448bbafd32813dfd5e35c...5ed02eb2a7b1424286a93378b12b5315/)
   [Finished :arrow_down:0.8% :arrow_up:0.13%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6421379fa9164cecaaa30b54663929ce...dd7d9bc3782946cf955c14a8eb0361c2/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#issuecomment-993843708


   https://issues.apache.org/jira/browse/ARROW-10209


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#issuecomment-993842510


   Based on #11951.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770466612



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>
-    """
-    options = ModeOptions(n, skip_nulls=skip_nulls, min_count=min_count)
-    return call_function("mode", [array], options)
-
-
-def filter(data, mask, null_selection_behavior='drop'):

Review comment:
       Hmm. I don't really like the idea of maintaining these special case wrappers, though. Or perhaps we could raise a deprecation warning before removing them.
   
   @amol- Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770779473



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>

Review comment:
       Looks good!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770471639



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>

Review comment:
       For example a separate file `_compute_docstrings.py` with contents such as:
   ```python
   
   function_examples = {
       "mode": """
           >>> import pyarrow as pa
           >>> import pyarrow.compute as pc
           >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
           >>> modes = pc.mode(arr, 2)
           >>> modes[0]
           <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
           >>> modes[1]
           <pyarrow.StructScalar: {'mode': 1, 'count': 2}>
           """,
   }
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#issuecomment-996086944


   Benchmark runs are scheduled for baseline = 8a4d8127aae80b27afb35755d44a8b61d770a706 and contender = 81c8a0e06bf71f82d0ef8350776f3440672b90e9. 81c8a0e06bf71f82d0ef8350776f3440672b90e9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/db504aa871de4ed9bdc9751ec5439404...64f00da8d1664fc9bb64f98385f2d209/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/973318c443e448bbafd32813dfd5e35c...5ed02eb2a7b1424286a93378b12b5315/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6421379fa9164cecaaa30b54663929ce...dd7d9bc3782946cf955c14a8eb0361c2/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770464367



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>

Review comment:
       Indeed there is. I'm not sure how to do that in a maintainable way, though. Should we have a separate file with function stubs and docstrings?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770302745



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1854,6 +1896,11 @@ def test_count():
     assert pc.count(arr, mode='only_valid').as_py() == 3
     assert pc.count(arr, mode='only_null').as_py() == 2
     assert pc.count(arr, mode='all').as_py() == 5
+    assert pc.count(arr, 'all').as_py() == 5

Review comment:
       I am wondering here, should we rather make the `mode` keyword keyword-only now we are changing this? 
   I don't find it super clear what the "all" in `pc.count(arr, 'all')` would mean.

##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>

Review comment:
       I think there is value in having examples in the docstrings (IMO we should rather do more of those on the long term). We could maybe append those to the generated docstring?

##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>
-    """
-    options = ModeOptions(n, skip_nulls=skip_nulls, min_count=min_count)
-    return call_function("mode", [array], options)
-
-
-def filter(data, mask, null_selection_behavior='drop'):

Review comment:
       Not fully sure if we should care, but users could have done `pc.filter(arr1, mask=arr2)` with a `mask` keyword, which will break now ..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11955: ARROW-10209: [Python] Support positional options in compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11955:
URL: https://github.com/apache/arrow/pull/11955#discussion_r770745366



##########
File path: python/pyarrow/compute.py
##########
@@ -326,243 +335,6 @@ def cast(arr, target_type, safe=True):
     return call_function("cast", [arr], options)
 
 
-def count_substring(array, pattern, *, ignore_case=False):
-    """
-    Count the occurrences of substring *pattern* in each value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def count_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Count the non-overlapping matches of regex *pattern* in each value
-    of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("count_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first occurrence of substring *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def find_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Find the index of the first match of regex *pattern* in each
-    value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search for
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("find_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_like(array, pattern, *, ignore_case=False):
-    """
-    Test if the SQL-style LIKE pattern *pattern* matches a value of a
-    string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        SQL-style LIKE pattern. '%' will match any number of
-        characters, '_' will match exactly one character, and all
-        other characters match themselves. To match a literal percent
-        sign or underscore, precede the character with a backslash.
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-
-    """
-    return call_function("match_like", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring(array, pattern, *, ignore_case=False):
-    """
-    Test if substring *pattern* is contained within a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        pattern to search for exact matches
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def match_substring_regex(array, pattern, *, ignore_case=False):
-    """
-    Test if regex *pattern* matches at any position a value of a string array.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    pattern : str
-        regex pattern to search
-    ignore_case : bool, default False
-        Ignore case while searching.
-
-    Returns
-    -------
-    result : pyarrow.Array or pyarrow.ChunkedArray
-    """
-    return call_function("match_substring_regex", [array],
-                         MatchSubstringOptions(pattern,
-                                               ignore_case=ignore_case))
-
-
-def mode(array, n=1, *, skip_nulls=True, min_count=0):
-    """
-    Return top-n most common values and number of times they occur in a passed
-    numerical (chunked) array, in descending order of occurrence. If there are
-    multiple values with same count, the smaller one is returned first.
-
-    Parameters
-    ----------
-    array : pyarrow.Array or pyarrow.ChunkedArray
-    n : int, default 1
-        Specify the top-n values.
-    skip_nulls : bool, default True
-        If True, ignore nulls in the input. Else return an empty array
-        if any input is null.
-    min_count : int, default 0
-        If there are fewer than this many values in the input, return
-        an empty array.
-
-    Returns
-    -------
-    An array of <input type "Mode", int64_t "Count"> structs
-
-    Examples
-    --------
-    >>> import pyarrow as pa
-    >>> import pyarrow.compute as pc
-    >>> arr = pa.array([1, 1, 2, 2, 3, 2, 2, 2])
-    >>> modes = pc.mode(arr, 2)
-    >>> modes[0]
-    <pyarrow.StructScalar: {'mode': 2, 'count': 5}>
-    >>> modes[1]
-    <pyarrow.StructScalar: {'mode': 1, 'count': 2}>

Review comment:
       Ok, done. Can you take a look again?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org