You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/12/16 13:09:00 UTC
[jira] [Updated] (SPARK-41548) Disable ANSI mode in pyspark.sql.tests.connect.test_connect_functions

     [ https://issues.apache.org/jira/browse/SPARK-41548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-41548:
---------------------------------
    Description: 
There are failures in `test_connect_functions` with ANSI mode on (https://github.com/apache/spark/actions/runs/3709431687/jobs/6288067223). I tried to fix but they are tricky to fix because Spark Connect does not respect the runtime configuration at the server side.

It is also tricky to fix the test to pass in both ANSI mode on and off. Therefore, it disables temporarily to make other tests pass. Note that PySpark tests stop in the middle if one fails.

{code:java}
======================================================================
1322ERROR [0.264s]: test_date_ts_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests)
1323----------------------------------------------------------------------
1324Traceback (most recent call last):
1325  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 1149, in test_date_ts_functions
1326    cdf.select(cfunc(cdf.ts1)).toPandas(),
1327  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas
1328    return self._session.client._to_pandas(query)
1329  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
1330    return self._execute_and_fetch(req)
1331  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
1332    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
1333  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
1334    return self._next()
1335  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
1336    raise self
1337grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
1338	status = StatusCode.UNKNOWN
1339	details = "[CAST_INVALID_INPUT] The value '1997/02/28 10:30:00' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error."
1340	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'1997/02/28 10:30:00\' of the type \"STRING\" cannot be cast to \"DATE\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:15.71844837+00:00"}"
1341>
1342
1343======================================================================
1344ERROR [0.527s]: test_string_functions_one_arg (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests)
1345----------------------------------------------------------------------
1346Traceback (most recent call last):
1347  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 985, in test_string_functions_one_arg
1348    cdf.select(cfunc("a"), cfunc(cdf.b)).toPandas(),
1349  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas
1350    return self._session.client._to_pandas(query)
1351  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
1352    return self._execute_and_fetch(req)
1353  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
1354    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
1355  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
1356    return self._next()
1357  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
1358    raise self
1359grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
1360	status = StatusCode.UNKNOWN
1361	details = "[CAST_INVALID_INPUT] The value '   ab   ' of the type "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error."
1362	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'   ab   \' of the type \"STRING\" cannot be cast to \"BIGINT\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:25.529953492+00:00"}"
1363>
1364
1365----------------------------------------------------------------------
1366Ran 14 tests in 40.832s
 {code}

  was:
There are too many failures in test_connect_functions with ANSI mode on, see [https://github.com/apache/spark/actions/runs/3709431687/jobs/6288067223]
{code:java}
======================================================================
1322ERROR [0.264s]: test_date_ts_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests)
1323----------------------------------------------------------------------
1324Traceback (most recent call last):
1325  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 1149, in test_date_ts_functions
1326    cdf.select(cfunc(cdf.ts1)).toPandas(),
1327  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas
1328    return self._session.client._to_pandas(query)
1329  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
1330    return self._execute_and_fetch(req)
1331  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
1332    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
1333  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
1334    return self._next()
1335  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
1336    raise self
1337grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
1338	status = StatusCode.UNKNOWN
1339	details = "[CAST_INVALID_INPUT] The value '1997/02/28 10:30:00' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error."
1340	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'1997/02/28 10:30:00\' of the type \"STRING\" cannot be cast to \"DATE\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:15.71844837+00:00"}"
1341>
1342
1343======================================================================
1344ERROR [0.527s]: test_string_functions_one_arg (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests)
1345----------------------------------------------------------------------
1346Traceback (most recent call last):
1347  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 985, in test_string_functions_one_arg
1348    cdf.select(cfunc("a"), cfunc(cdf.b)).toPandas(),
1349  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas
1350    return self._session.client._to_pandas(query)
1351  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
1352    return self._execute_and_fetch(req)
1353  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
1354    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
1355  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
1356    return self._next()
1357  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
1358    raise self
1359grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
1360	status = StatusCode.UNKNOWN
1361	details = "[CAST_INVALID_INPUT] The value '   ab   ' of the type "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error."
1362	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'   ab   \' of the type \"STRING\" cannot be cast to \"BIGINT\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:25.529953492+00:00"}"
1363>
1364
1365----------------------------------------------------------------------
1366Ran 14 tests in 40.832s
 {code}
This Jira aims to disable the tests for now to make sure the test coverage in other tests. PySpark tests fails in the middle if one fails.


> Disable ANSI mode in pyspark.sql.tests.connect.test_connect_functions
> ---------------------------------------------------------------------
>
>                 Key: SPARK-41548
>                 URL: https://issues.apache.org/jira/browse/SPARK-41548
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect, Tests
>    Affects Versions: 3.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>
> There are failures in `test_connect_functions` with ANSI mode on (https://github.com/apache/spark/actions/runs/3709431687/jobs/6288067223). I tried to fix but they are tricky to fix because Spark Connect does not respect the runtime configuration at the server side.
> It is also tricky to fix the test to pass in both ANSI mode on and off. Therefore, it disables temporarily to make other tests pass. Note that PySpark tests stop in the middle if one fails.
> {code:java}
> ======================================================================
> 1322ERROR [0.264s]: test_date_ts_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests)
> 1323----------------------------------------------------------------------
> 1324Traceback (most recent call last):
> 1325  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 1149, in test_date_ts_functions
> 1326    cdf.select(cfunc(cdf.ts1)).toPandas(),
> 1327  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas
> 1328    return self._session.client._to_pandas(query)
> 1329  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
> 1330    return self._execute_and_fetch(req)
> 1331  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
> 1332    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
> 1333  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
> 1334    return self._next()
> 1335  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
> 1336    raise self
> 1337grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
> 1338	status = StatusCode.UNKNOWN
> 1339	details = "[CAST_INVALID_INPUT] The value '1997/02/28 10:30:00' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error."
> 1340	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'1997/02/28 10:30:00\' of the type \"STRING\" cannot be cast to \"DATE\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:15.71844837+00:00"}"
> 1341>
> 1342
> 1343======================================================================
> 1344ERROR [0.527s]: test_string_functions_one_arg (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests)
> 1345----------------------------------------------------------------------
> 1346Traceback (most recent call last):
> 1347  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_function.py", line 985, in test_string_functions_one_arg
> 1348    cdf.select(cfunc("a"), cfunc(cdf.b)).toPandas(),
> 1349  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1533, in toPandas
> 1350    return self._session.client._to_pandas(query)
> 1351  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
> 1352    return self._execute_and_fetch(req)
> 1353  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
> 1354    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
> 1355  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
> 1356    return self._next()
> 1357  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
> 1358    raise self
> 1359grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
> 1360	status = StatusCode.UNKNOWN
> 1361	details = "[CAST_INVALID_INPUT] The value '   ab   ' of the type "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error."
> 1362	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {grpc_message:"[CAST_INVALID_INPUT] The value \'   ab   \' of the type \"STRING\" cannot be cast to \"BIGINT\" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set \"spark.sql.ansi.enabled\" to \"false\" to bypass this error.", grpc_status:2, created_time:"2022-12-16T01:49:25.529953492+00:00"}"
> 1363>
> 1364
> 1365----------------------------------------------------------------------
> 1366Ran 14 tests in 40.832s
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org