You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by nchammas <gi...@git.apache.org> on 2017/08/11 18:44:42 UTC

[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

GitHub user nchammas opened a pull request:

    https://github.com/apache/spark/pull/18926

    [SPARK-21712] [PySpark] Clarify type error for Column.substr()

    Proposed changes:
    * Clarify the type error that `Column.substr()` gives.
    
    Test plan:
    * Tested this manually.
    * Test code:
        ```python
        from pyspark.sql.functions import col, lit
        spark.createDataFrame([['nick']], schema=['name']).select(col('name').substr(0, lit(1)))
        ```
    * Before:
        ```
        TypeError: Can not mix the type
        ```
    * After:
        ```
        TypeError: startPos and length must be the same type. Got <class 'int'> and
        <class 'pyspark.sql.column.Column'>, respectively.
        ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nchammas/spark SPARK-21712-substr-type-error

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18926
    
----
commit 753dbe1743f552fe7b4867d3e4d24cdcc2ca1669
Author: Nicholas Chammas <ni...@gmail.com>
Date:   2017-08-11T18:39:59Z

    clarify type error for Column.substr()

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    To be honest, the current codes do not look good to me. Since this does not make the code worse, I will not revert it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    To summarize the feedback from @HyukjinKwon and @gatorsmile, I think what I need to do is:
    * Add a test for the mixed type case.
    * Explicitly check for `long` in Python 2 and throw a `TypeError` from PySpark.
    * Add a test for the `long` `TypeError` in Python 2.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132837359
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    +                .format(
    +                    startPos_t=type(startPos),
    +                    length_t=type(length),
    +                ))
             if isinstance(startPos, (int, long)):
    --- End diff --
    
    @nchammas, supporting `long` with Python 2 is not documented in the docstring and looks we throw unexpected exception by `long` with Python 2 as below:
    
    ```python
    from pyspark.sql import Row
    df = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)])
    df.select(df.name.substr(long(1), long(3)).alias("col")).collect()
    ```
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/column.py", line 411, in substr
        jc = self._jc.substr(startPos, length)
      File ".../spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
      File ".../spark/python/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File ".../spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 324, in get_return_value
    py4j.protocol.Py4JError: An error occurred while calling o47.substr. Trace:
    py4j.Py4JException: Method substr([class java.lang.Long, class java.lang.Long]) does not exist
    	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    	at py4j.Gateway.invoke(Gateway.java:274)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:214)
    	at java.lang.Thread.run(Thread.java:745)
    ```
    
    Would you mind double checking this and taking `long` out with a simple test with Python 2 as well? I think this will also address @gatorsmile's concern above as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Oh, like a docstring test for the type error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Even if we plan to drop `long` in this PR, [the checking](https://github.com/nchammas/spark/blob/fc1d84f002f5bd66bcad038a5581a05ade8dbc35/python/pyspark/sql/column.py#L408) looks weird to me. Basically, the change just wants to ensure the type of `length ` is `int`.
    
    Since this PR is pretty small, we should fix the issue instead of opening another one. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    I am merging this as it looks there is an explicit objection for the current change itself and it looks the issue is fixed by this. 
    
    To summarize the discussion here:
    
    - Cleaning up type checking logics, if possible.
    
    - Supporting "mixed" types. For example, `long` in Python 2 by casting. Another idea might be just wrapping it with `Column` for different types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Pinging freshly minted committer @HyukjinKwon for a review on this tiny PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132871517
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    cc @ueshin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Thank for cc'ing me. Yea looks fine. Could we add the small test in the description just in case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    For ^, I want to make this separate if possible. Do you guys strongly feel about supporting `long` (and namely "mixed" types) here - @gatorsmile and @ueshin? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r133116060
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1220,6 +1220,13 @@ def test_rand_functions(self):
             rndn2 = df.select('key', functions.randn(0)).collect()
             self.assertEqual(sorted(rndn1), sorted(rndn2))
     
    +    def test_string_functions(self):
    +        from pyspark.sql.functions import col, lit
    +        df = self.spark.createDataFrame([['nick']], schema=['name'])
    +        self.assertRaises(TypeError, lambda: df.select(col('name').substr(0, lit(1))))
    --- End diff --
    
    How about something like this below as this PR targets the exception message?
    
    ```python
    startPos = 0
    length = lit(1)
    self.assertRaisesRegexp(
        TypeError,
        "must be the same type.*%s.*%s.*" % (type(startPos), type(length)),
        lambda: df.select(col('name').substr(startPos, length)))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132856266
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    For the latter, It looks we should call either `substr` with column,column or with int,int. I would like to avoid changing these If either way does not reduce the code diff and is virtually same, if I understood correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80544/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    **[Test build #80683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80683/testReport)** for PR 18926 at commit [`a7fea20`](https://github.com/apache/spark/commit/a7fea2086b0b61a24d50a740ce2f2dcdc846337b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Agreed with @HyukjinKwon. This PR has a very narrow goal -- improving the error messages -- which I think it accomplished. I think @gatorsmile was expecting a more significant set of improvements, but that's not what this PR (or the associated JIRA) are about.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    The current codes around what this PR changes look not quite clean to me too and we should clean around this.
    
    But I think this PR itself is quite well-formed with the fix that is valid, simple and targeted with tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    ```Python
            if isinstance(startPos, int) and isinstance(length, int):
                jc = self._jc.substr(startPos, length)
            elif isinstance(startPos, Column) and isinstance(length, Column)::
                jc = self._jc.substr(startPos._jc, length._jc)
            else:
                raise TypeError(...)
    ```
    
    This looks much cleaner to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    I don't think this suggestion / discussion blocks this PR for few days. Let's go as is and make a followup as another improvement if anyone feels so. I will review that at my best.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r133186642
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1220,6 +1220,18 @@ def test_rand_functions(self):
             rndn2 = df.select('key', functions.randn(0)).collect()
             self.assertEqual(sorted(rndn1), sorted(rndn2))
     
    +    def test_string_functions(self):
    +        from pyspark.sql.functions import col, lit
    +        df = self.spark.createDataFrame([['nick']], schema=['name'])
    +        self.assertRaisesRegexp(
    +            TypeError,
    +            "must be the same type",
    +            lambda: df.select(col('name').substr(0, lit(1))))
    --- End diff --
    
    @HyukjinKwon - I opted to just search for a key phrase since that sufficiently captures the intent of the updated error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    @gatorsmile 
    
    > Even if we plan to drop `long` in this PR
    
    We are not dropping `long` in this PR. It was [never supported](https://github.com/apache/spark/pull/18926#discussion_r132837359). Both the docstring and actual behavior of `.substr()` make it clear that `long` is not supported. Only `int` and `Column` are supported.
    
    >  the checking looks weird to me. Basically, the change just wants to ensure the type of length is int.
    
    Can you elaborate please? As @HyukjinKwon pointed out, `.substr()` accepts either `int` or `Column`, but both arguments must be of the same type. The goal of this PR is to make that clearer.
    
    I am not changing any semantics or behavior other than to throw a Python `TypeError` on `long`, as opposed to letting the underlying Scala implementation throw a [messy exception](https://github.com/apache/spark/pull/18926#discussion_r132837359).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80683/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    LGTM except for the comment above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    **[Test build #80544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80544/testReport)** for PR 18926 at commit [`753dbe1`](https://github.com/apache/spark/commit/753dbe1743f552fe7b4867d3e4d24cdcc2ca1669).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    I was thinking of adding it in `python/pyspark/sql/tests.py`. Just in case.. maybe we could add it around https://github.com/apache/spark/commit/224e0e785b4b449ea638c2629263c798116a3011.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132847782
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    If PySpark always needs to check the types, are we doing the same things in all the other function calls?
    
    In addition, why not directly checking 
    ```Python
    if isinstance(length, (int, long)):
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    > Basically, the change just wants to ensure the type of length is int.
    
    Yes, but to be more correct, I think this makes sure if both are same types, `Column` or `int`. I think this throws an exception for a better error message, (rather than, for example, "Unexpected types: startPos :<type 'int'> length: <class 'pyspark.sql.column.Column'>"). I know this sounds rather excessive checking but this is still a valid checking.
    
    Possible way I think is with keeping this checking (which I believe is not shorter):
    
    ```python
            if isinstance(startPos, int) and isinstance(length, int):
                jc = self._jc.substr(startPos, length)
            elif isinstance(startPos, Column) and isinstance(length, Column)::
                jc = self._jc.substr(startPos._jc, length._jc)
            else:
                if type(startPos) != type(length):
                    raise TypeError(...)
                else:
                    raise TypeError("Unexpected type: %s" % type(startPos))
    ```
    
    or, with removing this excessive checking:
    
    ```python
            if isinstance(startPos, int) and isinstance(length, int):
                jc = self._jc.substr(startPos, length)
            elif isinstance(startPos, Column) and isinstance(length, Column)::
                jc = self._jc.substr(startPos._jc, length._jc)
            else:
                raise TypeError(...)
    ```
    
    I agree this is excessive (the former) but I wonder if we should remove already existing one. Please correct me if I am mistaken here.
    
    `long` does not work already but with unexpected exception message, which this PR fixes. 
    
    The current state fixes the JIRA specified here.
    
    I will stop staying against if you guys here feel strongly with fixing together but if you are not, could we go without that issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18926


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132835129
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    -> `startPos: {startPos_t}; length: {length_t}.`
    
    BTW, why we capture the type checking here, instead of doing it in the actual Scala impl of `substr`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Merged to master.
    
    Please open JIRAs / PRs related with the discussion above if anyone is willing to proceed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    **[Test build #80683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80683/testReport)** for PR 18926 at commit [`a7fea20`](https://github.com/apache/spark/commit/a7fea2086b0b61a24d50a740ce2f2dcdc846337b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    It sounds like the comment hides. Could you address the comment https://github.com/apache/spark/pull/18926#discussion_r133121408?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r133180053
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1220,6 +1220,13 @@ def test_rand_functions(self):
             rndn2 = df.select('key', functions.randn(0)).collect()
             self.assertEqual(sorted(rndn1), sorted(rndn2))
     
    +    def test_string_functions(self):
    +        from pyspark.sql.functions import col, lit
    +        df = self.spark.createDataFrame([['nick']], schema=['name'])
    +        self.assertRaises(TypeError, lambda: df.select(col('name').substr(0, lit(1))))
    --- End diff --
    
    I was considering doing that at first, but it felt like just duplicating logic. Looking through the other uses of `assertRaisesRegexp()`, it looks like most of the time we just search for a keyword, but there are also some instances where a large part of the exception message is checked. I can do that here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    **[Test build #80643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80643/testReport)** for PR 18926 at commit [`fc1d84f`](https://github.com/apache/spark/commit/fc1d84f002f5bd66bcad038a5581a05ade8dbc35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r133029498
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,8 +406,14 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    -        if isinstance(startPos, (int, long)):
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    +                .format(
    +                    startPos_t=type(startPos),
    +                    length_t=type(length),
    +                ))
    +        if isinstance(startPos, int):
    --- End diff --
    
    Since `long` is [not supported](https://github.com/apache/spark/pull/18926#discussion_r132837359), I just removed it from here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r133121408
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    I'm sorry for the delay.
    I guess we can support `long` by casting to `int` and also the "mixed" cases @gatorsmile metioned.
    What do you think @HyukjinKwon ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    It's cleaner but less specific. Unless we branch on whether `startPos` and `length` are the same type, we will give the same error message for mixed types and for unsupported types. That seems like a step back to me as these are two different problems which should get different error messages.
    
    If we want to group all the type checking in one place, we should do it as in the first example from [Hyukjin's comment](https://github.com/apache/spark/pull/18926#issuecomment-322393819).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    **[Test build #80544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80544/testReport)** for PR 18926 at commit [`753dbe1`](https://github.com/apache/spark/commit/753dbe1743f552fe7b4867d3e4d24cdcc2ca1669).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    **[Test build #80643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80643/testReport)** for PR 18926 at commit [`fc1d84f`](https://github.com/apache/spark/commit/fc1d84f002f5bd66bcad038a5581a05ade8dbc35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132856119
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    It needs to check the types in general and we need to hide the error message related with Java types. It is also true that we also need to make such logics in to Scala one to deduplicate this logic if they are duplicated. R has also a similar problem in some places. I don't think we should change this case anyway.
    
    It looks we should ...
    
    ```
    py4j.Py4JException: Method substr([class java.lang.Long ...
    ```
    
    or we should introduce bridge methods in Scala side and implement this checking logic IIRC.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r132836871
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    > -> startPos: {startPos_t}; length: {length_t}.
    
    I think this suggestion basically includes the same information with the current change? I think you should explain why.
    
    > BTW, why we do the type checking here, instead of doing it in the actual Scala impl of substr?
    
    Are you saying exposing Java types in the error message is better or suggesting method signature in Scala impl of `substr` with the check logic?
    
    > In addition, we do not support the mixed cases? For example, startPos is int, length is long.
    
    In Python, it makes sense calling `int` in general I think. `long` and `int` are unified in Python 3 and this PR looks targeting only the exception message fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18926#discussion_r133122900
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -406,7 +406,13 @@ def substr(self, startPos, length):
             [Row(col=u'Ali'), Row(col=u'Bob')]
             """
             if type(startPos) != type(length):
    -            raise TypeError("Can not mix the type")
    +            raise TypeError(
    +                "startPos and length must be the same type. "
    +                "Got {startPos_t} and {length_t}, respectively."
    --- End diff --
    
    Yea, I think we could support `long`. I think this PR basically targets exception message fix. Could we make this separate?
    
    I guess supporting the case above requires a set of regression tests with min/max of int, fix for documentation and etc, which I think is rather loosely related with the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80643/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the issue:

    https://github.com/apache/spark/pull/18926
  
    I think my latest commits address the concerns raised here. Let me know if I missed or misunderstood anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org