You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/05/23 12:31:32 UTC
[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21410
[SPARK-24366][SQL] Improving of error messages for type converting
## What changes were proposed in this pull request?
Currently, users are getting the following error messages on type conversions:
```
scala.MatchError: test (of class java.lang.String)
```
The message doesn't give any clues to the users where in the schema the error happened. In this PR, I would like to improve the error message like:
```
The value (test) of the type (java.lang.String) cannot be converted to struct<f1:int>
```
## How was this patch tested?
Added tests for converting of wrong values to `struct`, `map`, `array`, `string` and `decimal`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 type-conv-error
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21410.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21410
----
commit 26cc2f84ee6324db23936e20816d240031211311
Author: Maxim Gekk <ma...@...>
Date: 2018-05-23T12:22:33Z
Improving of error messages for type conversions
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91151/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21410
@gatorsmile Could you look at the PR, please. The changes should help us in trouble shooting of customer's issues.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21410
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21410
**[Test build #91151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91151/testReport)** for PR 21410 at commit [`ac76544`](https://github.com/apache/spark/commit/ac7654415a3ec82c6cf3306e664cf09018c66db6).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91062/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21410
> Is there a way to identify where in the schema the issue is occurring?
We can catch the exceptions on each level of schema tree traversal, and show sub-trees in each catch. For example: `array<map<..., array<struct<f2:int>>>>` , the first exception will point out `struct<f2:int>`, the second one `array<struct<f2:int>>` and up to the "root" schema.
> e.g., a.b.c where this is happening, is required to easily isolate the issue in the input data and resolve it.
I guess in the case of arrays and maps, you want to see indexes and keys. Could you provide concrete example with values and a schema (array, struct, map), and what kind of info the error should contain.
Just in case, I would propose to make such improvements in a separate PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91034/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21410
**[Test build #91034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91034/testReport)** for PR 21410 at commit [`26cc2f8`](https://github.com/apache/spark/commit/26cc2f84ee6324db23936e20816d240031211311).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21410
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21410
**[Test build #91034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91034/testReport)** for PR 21410 at commit [`26cc2f8`](https://github.com/apache/spark/commit/26cc2f84ee6324db23936e20816d240031211311).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21410
**[Test build #91062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91062/testReport)** for PR 21410 at commit [`26cc2f8`](https://github.com/apache/spark/commit/26cc2f84ee6324db23936e20816d240031211311).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21410
**[Test build #91062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91062/testReport)** for PR 21410 at commit [`26cc2f8`](https://github.com/apache/spark/commit/26cc2f84ee6324db23936e20816d240031211311).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21410#discussion_r190823171
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala ---
@@ -309,6 +322,9 @@ object CatalystTypeConverters {
case d: JavaBigDecimal => Decimal(d)
case d: JavaBigInteger => Decimal(d)
case d: Decimal => d
+ case other => throw new IllegalArgumentException(
+ s"The value (${other.toString}) of the type (${other.getClass.getCanonicalName}) "
+ + s"cannot be converted to ${dataType.simpleString}")
--- End diff --
All `simpleString`s are replaced by `catalogString`s
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21410
**[Test build #91151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91151/testReport)** for PR 21410 at commit [`ac76544`](https://github.com/apache/spark/commit/ac7654415a3ec82c6cf3306e664cf09018c66db6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21410: [SPARK-24366][SQL] Improving of error messages fo...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21410#discussion_r190790254
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala ---
@@ -309,6 +322,9 @@ object CatalystTypeConverters {
case d: JavaBigDecimal => Decimal(d)
case d: JavaBigInteger => Decimal(d)
case d: Decimal => d
+ case other => throw new IllegalArgumentException(
+ s"The value (${other.toString}) of the type (${other.getClass.getCanonicalName}) "
+ + s"cannot be converted to ${dataType.simpleString}")
--- End diff --
Let us use `catalogString` here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21410
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by ssimeonov <gi...@git.apache.org>.
Github user ssimeonov commented on the issue:
https://github.com/apache/spark/pull/21410
This is an excellent start and a worthy improvement.
Is there a way to identify where in the schema the issue is occurring? For example, when you have a schema with many nested fields, the failing value is helpful but the breadcrumb trail, e.g., `a.b.c` where this is happening, is required to easily isolate the issue in the input data and resolve it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21410
LGTM except one minor comment.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21410: [SPARK-24366][SQL] Improving of error messages for type ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21410
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org