You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dgingrich <gi...@git.apache.org> on 2017/03/09 19:21:54 UTC

[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

GitHub user dgingrich opened a pull request:

    https://github.com/apache/spark/pull/17227

    [SPARK-19507][PySpark][SQL] Show field name in _verify_type error

    ## What changes were proposed in this pull request?
    
    Add better error messages to _verify_type to track down which columns are not compliant with the schema.
    
    ## How was this patch tested?
    
    Unit tests (incomplete), doctest, hand inspection in REPL.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dgingrich/spark topic-spark-19507-verify-types

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17227
    
----
commit ad5e5e5e5ed8396efca4c61eb0219fcd5a5e2caf
Author: David Gingrich <da...@textio.com>
Date:   2017-02-28T08:05:00Z

    Remove "# noqa" comment from docstring

commit 5f72a547a948b5c5a787aace52df04bc8503888b
Author: David Gingrich <da...@textio.com>
Date:   2017-02-28T08:09:59Z

    WIP: Add name parameter and better debugging to _verify_types
    
    * Add name paramter to _verify_types
    * Include name parameter in debug messages
    * Build name message for nested structs, arrays, and maps
    * Add detailed tests to flesh out spec for _verify_types (WIP)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #74284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74284/testReport)** for PR 17227 at commit [`5f72a54`](https://github.com/apache/spark/commit/5f72a547a948b5c5a787aace52df04bc8503888b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123133739
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,157 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    --- End diff --
    
    Not a big deal too. Could we just take this out in a separate variable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r122273265
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    +
    +            # Struct
    +            ({"s": "a", "i": 1}, MyStructType, None),
    +            ({"s": "a", "i": None}, MyStructType, None),
    +            ({"s": "a"}, MyStructType, None),
    +            ({"s": "a", "f": 1.0}, MyStructType, None),     # Extra fields OK
    +            ({"s": "a", "i": "1"}, MyStructType, TypeError),
    +            (Row(s="a", i=1), MyStructType, None),
    +            (Row(s="a", i=None), MyStructType, None),
    +            (Row(s="a", i=1, f=1.0), MyStructType, None),   # Extra fields OK
    +            (Row(s="a"), MyStructType, ValueError),     # Row can't have missing field
    +            (Row(s="a", i="1"), MyStructType, TypeError),
    +            (["a", 1], MyStructType, None),
    +            (["a", None], MyStructType, None),
    +            (["a"], MyStructType, ValueError),
    +            (["a", "1"], MyStructType, TypeError),
    +            (("a", 1), MyStructType, None),
    +            (MyObj(s="a", i=1), MyStructType, None),
    +            (MyObj(s="a", i=None), MyStructType, None),
    +            (MyObj(s="a"), MyStructType, None),
    +            (MyObj(s="a", i="1"), MyStructType, TypeError),
    +        ]
    +
    +        for obj, data_type, exp in spec:
    +            msg = "_verify_type(%s, %s, nullable=False) == %s" % (obj, data_type, exp)
    +            if exp is None:
    +                try:
    +                    _verify_type(obj, data_type, nullable=False)
    +                except Exception as e:
    +                    traceback.print_exc()
    +                    self.fail(msg)
    +                self.assertTrue(True, msg)
    --- End diff --
    
    What is this line for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    @blrnw3 if you want to code review (though things might change, re: Jenkins failures)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123875271
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    I guess this is only place that we print "obj" maybe? If so, let's set `name=None`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Ping!  Let me know if you need more work from me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r124109294
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,157 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    --- End diff --
    
    Sure, either works for me.  Changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123874817
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -30,6 +30,19 @@
     import functools
     import time
     import datetime
    +import traceback
    +
    +if sys.version_info[:2] <= (2, 6):
    --- End diff --
    
    Yea, let's leave it then. Not a big deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r122287524
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    --- End diff --
    
    I think we should surround `_verify_type(obj, data_type, nullable=True)` with try block and check if it raises an exception or not as the same as we do in `test_verify_type_not_nullable` test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123134659
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,157 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            try:
    +                _verify_type(obj, data_type, nullable=True)
    +            except Exception as e:
    +                traceback.print_exc()
    +                self.fail(msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            (["1", None], ArrayType(StringType(), containsNull=False), ValueError),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=False),
    +             ValueError),
    +
    +            # Struct
    +            ({"s": "a", "i": 1}, MyStructType, None),
    +            ({"s": "a", "i": None}, MyStructType, None),
    +            ({"s": "a"}, MyStructType, None),
    +            ({"s": "a", "f": 1.0}, MyStructType, None),     # Extra fields OK
    +            ({"s": "a", "i": "1"}, MyStructType, TypeError),
    +            (Row(s="a", i=1), MyStructType, None),
    +            (Row(s="a", i=None), MyStructType, None),
    +            (Row(s="a", i=1, f=1.0), MyStructType, None),   # Extra fields OK
    +            (Row(s="a"), MyStructType, ValueError),     # Row can't have missing field
    +            (Row(s="a", i="1"), MyStructType, TypeError),
    +            (["a", 1], MyStructType, None),
    +            (["a", None], MyStructType, None),
    +            (["a"], MyStructType, ValueError),
    +            (["a", "1"], MyStructType, TypeError),
    +            (("a", 1), MyStructType, None),
    +            (MyObj(s="a", i=1), MyStructType, None),
    +            (MyObj(s="a", i=None), MyStructType, None),
    +            (MyObj(s="a"), MyStructType, None),
    +            (MyObj(s="a", i="1"), MyStructType, TypeError),
    +            (MyObj(s=None, i="1"), MyStructType, ValueError),
    +        ]
    +
    +        for obj, data_type, exp in spec:
    +            msg = "_verify_type(%s, %s, nullable=False) == %s" % (obj, data_type, exp)
    +            if exp is None:
    +                try:
    +                    _verify_type(obj, data_type, nullable=False)
    +                except Exception as e:
    --- End diff --
    
    It looks this `e` is not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78329/testReport)** for PR 17227 at commit [`5b4324e`](https://github.com/apache/spark/commit/5b4324e966a71fff787625b60001566f2895ac6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78657/testReport)** for PR 17227 at commit [`566f759`](https://github.com/apache/spark/commit/566f759aca61d4f44628bae0265fd930bff386ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #79058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79058/testReport)** for PR 17227 at commit [`6c1e0b6`](https://github.com/apache/spark/commit/6c1e0b690bdd1914b5056c8b2934614534c622cb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r122282700
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    --- End diff --
    
    I'd also like you to add `valueContainsNull=False` case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78740/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123610140
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,157 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            try:
    +                _verify_type(obj, data_type, nullable=True)
    +            except Exception as e:
    +                traceback.print_exc()
    +                self.fail(msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            (["1", None], ArrayType(StringType(), containsNull=False), ValueError),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=False),
    +             ValueError),
    +
    +            # Struct
    +            ({"s": "a", "i": 1}, MyStructType, None),
    +            ({"s": "a", "i": None}, MyStructType, None),
    +            ({"s": "a"}, MyStructType, None),
    +            ({"s": "a", "f": 1.0}, MyStructType, None),     # Extra fields OK
    +            ({"s": "a", "i": "1"}, MyStructType, TypeError),
    +            (Row(s="a", i=1), MyStructType, None),
    +            (Row(s="a", i=None), MyStructType, None),
    +            (Row(s="a", i=1, f=1.0), MyStructType, None),   # Extra fields OK
    +            (Row(s="a"), MyStructType, ValueError),     # Row can't have missing field
    +            (Row(s="a", i="1"), MyStructType, TypeError),
    +            (["a", 1], MyStructType, None),
    +            (["a", None], MyStructType, None),
    +            (["a"], MyStructType, ValueError),
    +            (["a", "1"], MyStructType, TypeError),
    +            (("a", 1), MyStructType, None),
    +            (MyObj(s="a", i=1), MyStructType, None),
    +            (MyObj(s="a", i=None), MyStructType, None),
    +            (MyObj(s="a"), MyStructType, None),
    +            (MyObj(s="a", i="1"), MyStructType, TypeError),
    +            (MyObj(s=None, i="1"), MyStructType, ValueError),
    +        ]
    +
    +        for obj, data_type, exp in spec:
    +            msg = "_verify_type(%s, %s, nullable=False) == %s" % (obj, data_type, exp)
    +            if exp is None:
    +                try:
    +                    _verify_type(obj, data_type, nullable=False)
    +                except Exception as e:
    --- End diff --
    
    Removed the `e`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74548/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17227


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123615521
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1300,70 +1300,80 @@ def _verify_type(obj, dataType, nullable=True):
             if nullable:
                 return
             else:
    -            raise ValueError("This field is not nullable, but got None")
    +            raise ValueError("%s: This field is not nullable, but got None" % name)
    --- End diff --
    
    No, I never check the actual exception message.  I normally don't check the contents of exception messages since they shouldn't be used programmatically (the tests are mostly to exercise all code paths to make sure I didn't break something).
    
    But it makes sense to test the prefix is set since that's the main point of the change.  Added a test looking at the exception message prefix, which should be robust.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r125204915
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,162 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_exception_msg(self):
    +        name = "test_name"
    +        try:
    +            _verify_type(None, StringType(), nullable=False, name=name)
    +            self.fail('Expected _verify_type() to throw so test can check exception message')
    +        except Exception as e:
    +            self.assertTrue(str(e).startswith(name))
    +
    +    def test_verify_type_ok_nullable(self):
    +        obj = None
    +        for data_type in [IntegerType(), FloatType(), StringType(), StructType([])]:
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            try:
    +                _verify_type(obj, data_type, nullable=True)
    +            except Exception as e:
    +                traceback.print_exc()
    +                self.fail(msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    --- End diff --
    
    Could we make the first character this lower-cased? (or maybe just simply `schema`?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    LGTM, do you have any other concerns? @holdenk, @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Thanks for the comments @ueshin, will address them this weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Wait ... this is possibly a hot path that requires row-per-operation ... @ueshin and @dgingrich, I think we should re-write this to avoid record-per-type dispatch ... For me, I wouldn't mind if we go and merge my approach, dgingrich#1 here for now. I will make a follow up right after it gets merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123099761
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    --- End diff --
    
    Added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    BTW, let's put `[WIP]` in the title if it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78657/testReport)** for PR 17227 at commit [`566f759`](https://github.com/apache/spark/commit/566f759aca61d4f44628bae0265fd930bff386ed).
     * This patch **fails PySpark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78657/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Jenkins test failures look unrelated to my changes (I was actually seeing them locally w/ `run-tests` so I tested my changes in isolation w/ `pytest` & some hacks).  What's the criteria for accepting the patch?  If the build has to be clean I'll rebase to master and see if the errors clear up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123875088
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    I meant this case:
    
    ```python
    >>> from pyspark.sql.types import *
    >>> spark.createDataFrame(["a"], StringType()).printSchema()
    ```
    
    ```
    root
     |-- value: string (nullable = true)
    ```
    ```python
    >>> from pyspark.sql.types import *
    >>> spark.createDataFrame(["a"], IntegerType()).printSchema()
    ```
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/session.py", line 526, in createDataFrame
        rdd, schema = self._createFromLocal(map(prepare, data), schema)
      File ".../spark/python/pyspark/sql/session.py", line 387, in _createFromLocal
        data = list(data)
      File ".../spark/python/pyspark/sql/session.py", line 516, in prepare
        verify_func(obj, dataType)
      File ".../spark/python/pyspark/sql/types.py", line 1326, in _verify_type
        % (name, dataType, obj, type(obj)))
    TypeError: obj: IntegerType can not accept object 'a' in type <type 'str'>
    ```
    
    It sounds "obj" should be "value". It looks we should specify the name around https://github.com/dgingrich/spark/blob/topic-spark-19507-verify-types/python/pyspark/sql/session.py#L516.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    @dgingrich, I tried to address my comments at my best here - https://github.com/dgingrich/spark/pull/1. Could you review that change and merge it if it looks good to you so that the change is merged into this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    (Let's proceed with this one per https://github.com/apache/spark/pull/17213#issuecomment-285530248)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #74430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74430/testReport)** for PR 17227 at commit [`5c5ab5b`](https://github.com/apache/spark/commit/5c5ab5b86b0995d8b955755ab972f8c5f1474956).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Oops, fixed missing quotes bug in `session.py`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #74548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74548/testReport)** for PR 17227 at commit [`fcd2067`](https://github.com/apache/spark/commit/fcd2067c97e60a72653cd34905727e827c95ccbd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `        # obj, data_type, exception (None for success or Exception subclass for error)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78740/testReport)** for PR 17227 at commit [`6c1e0b6`](https://github.com/apache/spark/commit/6c1e0b690bdd1914b5056c8b2934614534c622cb).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #74284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74284/testReport)** for PR 17227 at commit [`5f72a54`](https://github.com/apache/spark/commit/5f72a547a948b5c5a787aace52df04bc8503888b).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78324 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78324/testReport)** for PR 17227 at commit [`5b4324e`](https://github.com/apache/spark/commit/5b4324e966a71fff787625b60001566f2895ac6a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #74548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74548/testReport)** for PR 17227 at commit [`fcd2067`](https://github.com/apache/spark/commit/fcd2067c97e60a72653cd34905727e827c95ccbd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123874870
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,157 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    --- End diff --
    
    Let's do this like ...
    
    ```python
    types = [IntegerType()), FloatType()), StringType()), StructType([])]
    for ...
    ```
    
    if you don't mind. I think taking out the same value in a loop is slightly better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r124135474
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    Set `name=value` in the call at session.py line 516.  
    
    It will still print `obj` if the schema is a StructType: `TypeError: obj.a: MyStructType can not accept object 'a' in type <type 'str'>`.  Would you like to change that too?
    
    Right now changing the default name to None would make the error message worse: `TypeError: None: IntegerType can not accept object 'a' in type <type 'str'>`.  
    
    The best way to make the error message pretty is probably:
    - Set the default name to None
    - If name==None, don't prepend the `%s: ` string to the error messages
    
    That would make your exmple: `TypeError: IntegerType can not accept object 'a' in type <type 'str'>`.  
    
    IMO `obj` is not as pretty but reasonable since it's so simple.  Let me know what you prefer.  My only goal is that next time I get a schema failure it tells me what field to look at :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123104696
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    +
    +            # Struct
    +            ({"s": "a", "i": 1}, MyStructType, None),
    +            ({"s": "a", "i": None}, MyStructType, None),
    +            ({"s": "a"}, MyStructType, None),
    +            ({"s": "a", "f": 1.0}, MyStructType, None),     # Extra fields OK
    +            ({"s": "a", "i": "1"}, MyStructType, TypeError),
    +            (Row(s="a", i=1), MyStructType, None),
    +            (Row(s="a", i=None), MyStructType, None),
    +            (Row(s="a", i=1, f=1.0), MyStructType, None),   # Extra fields OK
    +            (Row(s="a"), MyStructType, ValueError),     # Row can't have missing field
    +            (Row(s="a", i="1"), MyStructType, TypeError),
    +            (["a", 1], MyStructType, None),
    +            (["a", None], MyStructType, None),
    +            (["a"], MyStructType, ValueError),
    +            (["a", "1"], MyStructType, TypeError),
    +            (("a", 1), MyStructType, None),
    +            (MyObj(s="a", i=1), MyStructType, None),
    +            (MyObj(s="a", i=None), MyStructType, None),
    +            (MyObj(s="a"), MyStructType, None),
    +            (MyObj(s="a", i="1"), MyStructType, TypeError),
    +        ]
    +
    +        for obj, data_type, exp in spec:
    +            msg = "_verify_type(%s, %s, nullable=False) == %s" % (obj, data_type, exp)
    +            if exp is None:
    +                try:
    +                    _verify_type(obj, data_type, nullable=False)
    +                except Exception as e:
    +                    traceback.print_exc()
    +                    self.fail(msg)
    +                self.assertTrue(True, msg)
    --- End diff --
    
    IIRC that was required at some point, I think to get a test runner to pick up the test.  But I just tried removing it and tests ran with pytest and the Python3 unittest runner, so removed the line here and in `test_verify_type_ok_nullable`.  We should probably look at the full test suite output from Jenkins to make sure the tests are run under Python2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123099962
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    --- End diff --
    
    Added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    cc @ueshin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78324/testReport)** for PR 17227 at commit [`5b4324e`](https://github.com/apache/spark/commit/5b4324e966a71fff787625b60001566f2895ac6a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79058/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r125205112
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    Could we maybe then `None` and not print?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #79058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79058/testReport)** for PR 17227 at commit [`6c1e0b6`](https://github.com/apache/spark/commit/6c1e0b690bdd1914b5056c8b2934614534c622cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123875195
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    Let's fix this case.
    
    ```python
    >>> from pyspark.sql.types import *
    >>> spark.createDataFrame(["a"], StringType()).printSchema()
    ```
    
    ```
    root
     |-- value: string (nullable = true)
    ```
    ```python
    >>> from pyspark.sql.types import *
    >>> spark.createDataFrame(["a"], IntegerType()).printSchema()
    ```
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/session.py", line 526, in createDataFrame
        rdd, schema = self._createFromLocal(map(prepare, data), schema)
      File ".../spark/python/pyspark/sql/session.py", line 387, in _createFromLocal
        data = list(data)
      File ".../spark/python/pyspark/sql/session.py", line 516, in prepare
        verify_func(obj, dataType)
      File ".../spark/python/pyspark/sql/types.py", line 1326, in _verify_type
        % (name, dataType, obj, type(obj)))
    TypeError: obj: IntegerType can not accept object 'a' in type <type 'str'>
    ```
    
    It sounds "obj" should be "value". It looks we should specify the name around https://github.com/dgingrich/spark/blob/topic-spark-19507-verify-types/python/pyspark/sql/session.py#L516.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78329/testReport)** for PR 17227 at commit [`5b4324e`](https://github.com/apache/spark/commit/5b4324e966a71fff787625b60001566f2895ac6a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78472/testReport)** for PR 17227 at commit [`2351153`](https://github.com/apache/spark/commit/23511537966577bd6d2b4bbe9dd898faa0e72e97).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123100185
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    +
    +            # Struct
    +            ({"s": "a", "i": 1}, MyStructType, None),
    +            ({"s": "a", "i": None}, MyStructType, None),
    +            ({"s": "a"}, MyStructType, None),
    +            ({"s": "a", "f": 1.0}, MyStructType, None),     # Extra fields OK
    +            ({"s": "a", "i": "1"}, MyStructType, TypeError),
    +            (Row(s="a", i=1), MyStructType, None),
    +            (Row(s="a", i=None), MyStructType, None),
    +            (Row(s="a", i=1, f=1.0), MyStructType, None),   # Extra fields OK
    +            (Row(s="a"), MyStructType, ValueError),     # Row can't have missing field
    +            (Row(s="a", i="1"), MyStructType, TypeError),
    +            (["a", 1], MyStructType, None),
    +            (["a", None], MyStructType, None),
    +            (["a"], MyStructType, ValueError),
    +            (["a", "1"], MyStructType, TypeError),
    +            (("a", 1), MyStructType, None),
    +            (MyObj(s="a", i=1), MyStructType, None),
    +            (MyObj(s="a", i=None), MyStructType, None),
    +            (MyObj(s="a"), MyStructType, None),
    +            (MyObj(s="a", i="1"), MyStructType, TypeError),
    --- End diff --
    
    Added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78324/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Pushed new commit addressing PR feedback.  I had to create a new local environment (new laptop) and was getting test errors on unrelated code before my changes.  I'm hoping that's just my laptop and Jenkins will pass.  If not I'll retry rebasing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74284/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by blrnw3 <gi...@git.apache.org>.
Github user blrnw3 commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    From my perspective you've covered everything, so thanks for pushing this through.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123134376
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1300,70 +1300,80 @@ def _verify_type(obj, dataType, nullable=True):
             if nullable:
                 return
             else:
    -            raise ValueError("This field is not nullable, but got None")
    +            raise ValueError("%s: This field is not nullable, but got None" % name)
    --- End diff --
    
    Probably, I missed something. However, is there any test case that actually checks this message change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123611855
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    This will print "obj" when called from `session.createDataFrame` (https://github.com/dgingrich/spark/blob/topic-spark-19507-verify-types/python/pyspark/sql/session.py#L408).  It'd be easy to set the name where it's called but it wasn't clear what to set it to.   The input can be either an RDD, list, or pandas.DataFrame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Looks like the rebase fixed the tests.  This is good to go from my POV, let me know if you need any changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74430/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123099453
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    --- End diff --
    
    Added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Yea, if the change is not too different, I would rather help the review the first PR before taking over. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123133489
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -30,6 +30,19 @@
     import functools
     import time
     import datetime
    +import traceback
    +
    +if sys.version_info[:2] <= (2, 6):
    --- End diff --
    
    Not a big deal but I guess we dropped 2.6 support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78329/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r122282508
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    --- End diff --
    
    I'd like you to add `containsNull=False` case too which contains `None` in the list to verify that it raises `ValueError` correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    I'm a bit busy today but I'll try and take a look through this and https://github.com/apache/spark/pull/17213 sometime this week. If y'all agree on which one you think is better though I can focus on reviewing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123134991
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1249,7 +1249,7 @@ def _infer_schema_type(obj, dataType):
     }
     
     
    -def _verify_type(obj, dataType, nullable=True):
    +def _verify_type(obj, dataType, nullable=True, name="obj"):
    --- End diff --
    
    Just a question. @dgingrich Do you maybe know if there is any change that "obj" is printed instead? It is rather a nitpick but I would think it is odds if it prints "obj".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Opening as a comparison for https://github.com/apache/spark/pull/17213


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123609093
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -30,6 +30,19 @@
     import functools
     import time
     import datetime
    +import traceback
    +
    +if sys.version_info[:2] <= (2, 6):
    --- End diff --
    
    Looks like most of the other tests still have the `<= (2, 6)` check (see python/pyspark/ml/tests.py) so leaving in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [WIP][SPARK-19507][PySpark][SQL] Show field name in _ver...

Posted by blrnw3 <gi...@git.apache.org>.
Github user blrnw3 commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    \U0001f44d 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123610062
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,157 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    --- End diff --
    
    Meaning remove the None from the tuples and just loop over the Types?  I like it a little better as is since the tuples are basically `_verify_type`'s args but am fine with either.  Let me know which you prefer and I can change the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r122283340
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2367,6 +2380,151 @@ def range_frame_match():
     
             importlib.reload(window)
     
    +
    +class TypesTest(unittest.TestCase):
    +
    +    def test_verify_type_ok_nullable(self):
    +        for obj, data_type in [
    +                (None, IntegerType()),
    +                (None, FloatType()),
    +                (None, StringType()),
    +                (None, StructType([]))]:
    +            _verify_type(obj, data_type, nullable=True)
    +            msg = "_verify_type(%s, %s, nullable=True)" % (obj, data_type)
    +            self.assertTrue(True, msg)
    +
    +    def test_verify_type_not_nullable(self):
    +        import array
    +        import datetime
    +        import decimal
    +
    +        MyStructType = StructType([
    +            StructField('s', StringType(), nullable=False),
    +            StructField('i', IntegerType(), nullable=True)])
    +
    +        class MyObj:
    +            def __init__(self, **ka):
    +                for k, v in ka.items():
    +                    setattr(self, k, v)
    +
    +        # obj, data_type, exception (None for success or Exception subclass for error)
    +        spec = [
    +            # Strings (match anything but None)
    +            ("", StringType(), None),
    +            (u"", StringType(), None),
    +            (1, StringType(), None),
    +            (1.0, StringType(), None),
    +            ([], StringType(), None),
    +            ({}, StringType(), None),
    +            (None, StringType(), ValueError),   # Only None test
    +
    +            # UDT
    +            (ExamplePoint(1.0, 2.0), ExamplePointUDT(), None),
    +            (ExamplePoint(1.0, 2.0), PythonOnlyUDT(), ValueError),
    +
    +            # Boolean
    +            (True, BooleanType(), None),
    +            (1, BooleanType(), TypeError),
    +            ("True", BooleanType(), TypeError),
    +            ([1], BooleanType(), TypeError),
    +
    +            # Bytes
    +            (-(2**7) - 1, ByteType(), ValueError),
    +            (-(2**7), ByteType(), None),
    +            (2**7 - 1, ByteType(), None),
    +            (2**7, ByteType(), ValueError),
    +            ("1", ByteType(), TypeError),
    +            (1.0, ByteType(), TypeError),
    +
    +            # Shorts
    +            (-(2**15) - 1, ShortType(), ValueError),
    +            (-(2**15), ShortType(), None),
    +            (2**15 - 1, ShortType(), None),
    +            (2**15, ShortType(), ValueError),
    +
    +            # Integer
    +            (-(2**31) - 1, IntegerType(), ValueError),
    +            (-(2**31), IntegerType(), None),
    +            (2**31 - 1, IntegerType(), None),
    +            (2**31, IntegerType(), ValueError),
    +
    +            # Long
    +            (2**64, LongType(), None),
    +
    +            # Float & Double
    +            (1.0, FloatType(), None),
    +            (1, FloatType(), TypeError),
    +            (1.0, DoubleType(), None),
    +            (1, DoubleType(), TypeError),
    +
    +            # Decimal
    +            (decimal.Decimal("1.0"), DecimalType(), None),
    +            (1.0, DecimalType(), TypeError),
    +            (1, DecimalType(), TypeError),
    +            ("1.0", DecimalType(), TypeError),
    +
    +            # Binary
    +            (bytearray([1, 2]), BinaryType(), None),
    +            (1, BinaryType(), TypeError),
    +
    +            # Date/Time
    +            (datetime.date(2000, 1, 2), DateType(), None),
    +            (datetime.datetime(2000, 1, 2, 3, 4), DateType(), None),
    +            ("2000-01-02", DateType(), TypeError),
    +            (datetime.datetime(2000, 1, 2, 3, 4), TimestampType(), None),
    +            (946811040, TimestampType(), TypeError),
    +
    +            # Array
    +            ([], ArrayType(IntegerType()), None),
    +            (["1", None], ArrayType(StringType(), containsNull=True), None),
    +            ([1, 2], ArrayType(IntegerType()), None),
    +            ([1, "2"], ArrayType(IntegerType()), TypeError),
    +            ((1, 2), ArrayType(IntegerType()), None),
    +            (array.array('h', [1, 2]), ArrayType(IntegerType()), None),
    +
    +            # Map
    +            ({}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(StringType(), IntegerType()), None),
    +            ({"a": 1}, MapType(IntegerType(), IntegerType()), TypeError),
    +            ({"a": "1"}, MapType(StringType(), IntegerType()), TypeError),
    +            ({"a": None}, MapType(StringType(), IntegerType(), valueContainsNull=True), None),
    +
    +            # Struct
    +            ({"s": "a", "i": 1}, MyStructType, None),
    +            ({"s": "a", "i": None}, MyStructType, None),
    +            ({"s": "a"}, MyStructType, None),
    +            ({"s": "a", "f": 1.0}, MyStructType, None),     # Extra fields OK
    +            ({"s": "a", "i": "1"}, MyStructType, TypeError),
    +            (Row(s="a", i=1), MyStructType, None),
    +            (Row(s="a", i=None), MyStructType, None),
    +            (Row(s="a", i=1, f=1.0), MyStructType, None),   # Extra fields OK
    +            (Row(s="a"), MyStructType, ValueError),     # Row can't have missing field
    +            (Row(s="a", i="1"), MyStructType, TypeError),
    +            (["a", 1], MyStructType, None),
    +            (["a", None], MyStructType, None),
    +            (["a"], MyStructType, ValueError),
    +            (["a", "1"], MyStructType, TypeError),
    +            (("a", 1), MyStructType, None),
    +            (MyObj(s="a", i=1), MyStructType, None),
    +            (MyObj(s="a", i=None), MyStructType, None),
    +            (MyObj(s="a"), MyStructType, None),
    +            (MyObj(s="a", i="1"), MyStructType, TypeError),
    --- End diff --
    
    Same here, `None` for `s` field.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78472/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78472/testReport)** for PR 17227 at commit [`2351153`](https://github.com/apache/spark/commit/23511537966577bd6d2b4bbe9dd898faa0e72e97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by dgingrich <gi...@git.apache.org>.
Github user dgingrich commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Tests finished, ready for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    **[Test build #78740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78740/testReport)** for PR 17227 at commit [`6c1e0b6`](https://github.com/apache/spark/commit/6c1e0b690bdd1914b5056c8b2934614534c622cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17227: [SPARK-19507][PySpark][SQL] Show field name in _v...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17227#discussion_r123133997
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -627,7 +627,6 @@ def sortPartition(iterator):
         def sortByKey(self, ascending=True, numPartitions=None, keyfunc=lambda x: x):
             """
             Sorts this RDD, which is assumed to consist of (key, value) pairs.
    -        # noqa
    --- End diff --
    
    (I have no idea why this was added in the first place ...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17227: [SPARK-19507][PySpark][SQL] Show field name in _verify_t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17227
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org