You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by travishegner <gi...@git.apache.org> on 2015/11/05 16:10:20 UTC
[GitHub] spark pull request: Oracle dialect to handle nonspecific numeric t...
GitHub user travishegner opened a pull request:
https://github.com/apache/spark/pull/9495
Oracle dialect to handle nonspecific numeric types
This is the alternative/agreed upon solution to PR #8780.
Creating an OracleDialect to handle the nonspecific numeric types that can be defined in oracle.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/travishegner/spark OracleDialect
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9495.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9495
----
commit cc764d47d25eb35ae6c38c1caf1a87d65bdd10fc
Author: Travis Hegner <th...@trilliumit.com>
Date: 2015-11-05T14:53:43Z
initial attempt
commit 3157ed5a71047ed85e8d80d4ef92c34273c38f1e
Author: Travis Hegner <th...@trilliumit.com>
Date: 2015-11-05T15:04:09Z
adding canHandle override, and registration of OracleDialect
commit 839bcb582e138144b2d0757c9471de3a7cacc2ac
Author: Travis Hegner <th...@trilliumit.com>
Date: 2015-11-05T15:06:19Z
more attribution
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9495#discussion_r44035938
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -315,3 +316,27 @@ case object DerbyDialect extends JdbcDialect {
}
+/**
+ * :: DeveloperApi ::
+ * Default Oracle dialect, mapping a nonspecific
+ * numeric type to a general decimal type.
+ * Solution by @cloud-fan and @bdolbeare (github.com)
+ */
+@DeveloperApi
+case object OracleDialect extends JdbcDialect {
+ override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle")
+ override def getCatalystType(
+ sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
+ // Handle NUMBER fields that have no precision/scale in special way
+ // because JDBC ResultSetMetaData converts this to 0 procision and -127 scale
+ if (sqlType == Types.NUMERIC && size == 0) {
+ // This is sub-optimal as we have to pick a precision/scale in advance whereas the data
+ // in Oracle is allowed to have different precision/scale for each value.
+ // This conversion works in our domain for now though we need a more durable solution.
+ // Look into changing JDBCRDD (line 406):
+ // FROM: mutableRow.update(i, Decimal(decimalVal, p, s))
+ // TO: mutableRow.update(i, Decimal(decimalVal))
+ Some(DecimalType(DecimalType.MAX_PRECISION, 10))
+ } else None
--- End diff --
`} else {
None
}`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9495#discussion_r44035571
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -315,3 +316,27 @@ case object DerbyDialect extends JdbcDialect {
}
+/**
+ * :: DeveloperApi ::
+ * Default Oracle dialect, mapping a nonspecific
+ * numeric type to a general decimal type.
+ * Solution by @cloud-fan and @bdolbeare (github.com)
--- End diff --
@travishegner To provide more information, you can provide the links to their comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154100299
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154102734
**[Test build #45118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45118/consoleFull)** for PR 9495 at commit [`839bcb5`](https://github.com/apache/spark/commit/839bcb582e138144b2d0757c9471de3a7cacc2ac).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154106491
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154154560
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by dsdinter <gi...@git.apache.org>.
Github user dsdinter commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-173553569
It seems this issue in OJDBC and started to happen after Oracle 11g:
http://stackoverflow.com/questions/2133679/why-would-number-columns-scale-and-or-precision-differ-in-jdbc-from-oracle-10-t
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9495
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154103401
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45118/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154171003
@travishegner I backported your change to branch 1.4 and branch 1.5 (see https://github.com/apache/spark/pull/9498) with a minor change on comments. Once you update your PR, I will merge it to master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154106392
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154154564
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45122/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by travishegner <gi...@git.apache.org>.
Github user travishegner commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154423022
Thanks @yhuai for taking care of this!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154110975
**[Test build #45122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45122/consoleFull)** for PR 9495 at commit [`a1370a7`](https://github.com/apache/spark/commit/a1370a7a93e1bc4e2f908be980165223d0f67d58).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154099275
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154103394
**[Test build #45118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45118/consoleFull)** for PR 9495 at commit [`839bcb5`](https://github.com/apache/spark/commit/839bcb582e138144b2d0757c9471de3a7cacc2ac).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154154389
**[Test build #45122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45122/consoleFull)** for PR 9495 at commit [`a1370a7`](https://github.com/apache/spark/commit/a1370a7a93e1bc4e2f908be980165223d0f67d58).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9495#discussion_r44035882
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ---
@@ -315,3 +316,27 @@ case object DerbyDialect extends JdbcDialect {
}
+/**
+ * :: DeveloperApi ::
+ * Default Oracle dialect, mapping a nonspecific
+ * numeric type to a general decimal type.
+ * Solution by @cloud-fan and @bdolbeare (github.com)
+ */
+@DeveloperApi
+case object OracleDialect extends JdbcDialect {
+ override def canHandle(url: String): Boolean = url.startsWith("jdbc:oracle")
+ override def getCatalystType(
+ sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
+ // Handle NUMBER fields that have no precision/scale in special way
+ // because JDBC ResultSetMetaData converts this to 0 procision and -127 scale
+ if (sqlType == Types.NUMERIC && size == 0) {
+ // This is sub-optimal as we have to pick a precision/scale in advance whereas the data
+ // in Oracle is allowed to have different precision/scale for each value.
+ // This conversion works in our domain for now though we need a more durable solution.
+ // Look into changing JDBCRDD (line 406):
--- End diff --
I think putting the line number at here is not a very robust way. Since we already described the problem, we can simply explain our workaround at here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154103397
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154100332
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154182427
I am merging it to master and branch 1.6. I will update the comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Oracle dialect to handle nonspec...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9495#issuecomment-154088191
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org