You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/06 10:47:24 UTC

[GitHub] [spark] MaxGekk opened a new pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

MaxGekk opened a new pull request #34503:
URL: https://github.com/apache/spark/pull/34503


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   By running the modified test suites:
   ```
   $ build/sbt "test:testOnly *AvroV1Suite"
   $ build/sbt "test:testOnly *AvroV2Suite"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744142379



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -587,7 +587,8 @@ case class DataSource(
   private def disallowWritingIntervals(
       dataTypes: Seq[DataType],
       forbidAnsiIntervals: Boolean): Unit = {
-    val isWriteAllowedSource = writeAllowedSources(providingClass)
+    val isWriteAllowedSource = writeAllowedSources(providingClass) ||
+      providingClass.getCanonicalName == "org.apache.spark.sql.avro.AvroFileFormat"

Review comment:
       I think we can remove this check since we support ANSI interval in all built-in file based datasources. The `text` and `libsvm` must have their own checks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744452768



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"
+
   private def toSqlTypeHelper(avroSchema: Schema, existingRecordNames: Set[String]): SchemaType = {
     avroSchema.getType match {
       case INT => avroSchema.getLogicalType match {
         case _: Date => SchemaType(DateType, nullable = false)
-        case _ => SchemaType(IntegerType, nullable = false)
+        case _ =>
+          val catalystTypeAttrValue = avroSchema.getProp(CATALYST_TYPE_PROP_NAME)
+          val catalystType = if (catalystTypeAttrValue == null) {
+            IntegerType
+          } else {
+            CatalystSqlParser.parseDataType(catalystTypeAttrValue)

Review comment:
       > It needs to be compatible with a physical avro INT at least.
   
   The deserializer should fail at https://github.com/apache/spark/pull/34503/files#diff-f82299271bc612e2032bc9e22f0f2ed3af0eccb2a37acdd6a25cc1d999c1be92R333-R339 with proper error message telling users about the incompatible schema. I am not sure about assert, especially, for **external** inputs. From my point of view, asserts are for checking contracts between **internal** components.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744661714



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"

Review comment:
       > Shall we make it sparkType?
   
   We have already had similar property in ORC, see https://github.com/apache/spark/pull/34184/files#diff-e14fd8725cf71eee7b34fa299c2f3abe5a0033f9abce9de4c7e081ba57991b0bR53 . I would prefer to have the same here for consistency.
   
   > We can have our sparkType annotation.
   
   Do you mean a property? At least in docs of Avro API lib, it is called property, see https://avro.apache.org/docs/current/api/java/org/apache/avro/JsonProperties.html#addProp-java.lang.String-java.lang.String- . That's what I did, isn't it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963112361


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49472/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962478235


   **[Test build #144956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144956/testReport)** for PR 34503 at commit [`1ff02cc`](https://github.com/apache/spark/commit/1ff02ccc33061c65cac54351dd5c5e0ec21258da).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962511935


   **[Test build #144956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144956/testReport)** for PR 34503 at commit [`1ff02cc`](https://github.com/apache/spark/commit/1ff02ccc33061c65cac54351dd5c5e0ec21258da).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963156786


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49472/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #34503:
URL: https://github.com/apache/spark/pull/34503


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744661714



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"

Review comment:
       > Shall we make it sparkType?
   
   We have already had similar property in ORC, see https://github.com/apache/spark/pull/34184/files#diff-e14fd8725cf71eee7b34fa299c2f3abe5a0033f9abce9de4c7e081ba57991b0bR53 . I would prefer to have the same name here for consistency.
   
   > We can have our sparkType annotation.
   
   Do you mean a property? At least in docs of Avro API lib, it is called property, see https://avro.apache.org/docs/current/api/java/org/apache/avro/JsonProperties.html#addProp-java.lang.String-java.lang.String- . That's what I did, isn't it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963172378


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49472/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962469258


   **[Test build #144954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144954/testReport)** for PR 34503 at commit [`64d99c8`](https://github.com/apache/spark/commit/64d99c83974e6f8b994b0747636096e87789fbe2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962494129


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49427/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744651905



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"

Review comment:
       Avro has logical annotation `logicalType`: https://avro.apache.org/docs/1.10.2/spec.html#Logical%20Types
   We can have our `sparkType` annotation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963172378


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49472/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963079482


   **[Test build #145000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145000/testReport)** for PR 34503 at commit [`7e6b22c`](https://github.com/apache/spark/commit/7e6b22cc0d1aa37be62f0c52c029c0ad51904df6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963362619


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145000/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744142080



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -587,7 +587,8 @@ case class DataSource(
   private def disallowWritingIntervals(
       dataTypes: Seq[DataType],
       forbidAnsiIntervals: Boolean): Unit = {
-    val isWriteAllowedSource = writeAllowedSources(providingClass)
+    val isWriteAllowedSource = writeAllowedSources(providingClass) ||
+      providingClass.getCanonicalName == "org.apache.spark.sql.avro.AvroFileFormat"

Review comment:
       I cannot refer to the `AvroFileFormat` class from `external` at here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962514468


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962494129


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49427/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962493089


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49427/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963259279


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963362619


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145000/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962433185


   **[Test build #144954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144954/testReport)** for PR 34503 at commit [`64d99c8`](https://github.com/apache/spark/commit/64d99c83974e6f8b994b0747636096e87789fbe2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962446744


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49425/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962478235


   **[Test build #144956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144956/testReport)** for PR 34503 at commit [`1ff02cc`](https://github.com/apache/spark/commit/1ff02ccc33061c65cac54351dd5c5e0ec21258da).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744386019



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -587,7 +587,8 @@ case class DataSource(
   private def disallowWritingIntervals(
       dataTypes: Seq[DataType],
       forbidAnsiIntervals: Boolean): Unit = {
-    val isWriteAllowedSource = writeAllowedSources(providingClass)
+    val isWriteAllowedSource = writeAllowedSources(providingClass) ||
+      providingClass.getCanonicalName == "org.apache.spark.sql.avro.AvroFileFormat"

Review comment:
       Yea let's clean it up in this PR, as we are probably done with supporting ANSI intervals in builtin file sources.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744465438



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"
+
   private def toSqlTypeHelper(avroSchema: Schema, existingRecordNames: Set[String]): SchemaType = {
     avroSchema.getType match {
       case INT => avroSchema.getLogicalType match {
         case _: Date => SchemaType(DateType, nullable = false)
-        case _ => SchemaType(IntegerType, nullable = false)
+        case _ =>
+          val catalystTypeAttrValue = avroSchema.getProp(CATALYST_TYPE_PROP_NAME)
+          val catalystType = if (catalystTypeAttrValue == null) {
+            IntegerType
+          } else {
+            CatalystSqlParser.parseDataType(catalystTypeAttrValue)

Review comment:
       ah I see, the deserializer can check it. LGTM!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962433185


   **[Test build #144954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144954/testReport)** for PR 34503 at commit [`64d99c8`](https://github.com/apache/spark/commit/64d99c83974e6f8b994b0747636096e87789fbe2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744661714



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"

Review comment:
       > Shall we make it sparkType?
   
   We have already had similar property in ORC, see https://github.com/apache/spark/pull/34184/files#diff-e14fd8725cf71eee7b34fa299c2f3abe5a0033f9abce9de4c7e081ba57991b0bR53 . I would prefer to have the same name here for consistency.
   
   > We can have our sparkType annotation.
   
   Do you mean a property? At least it is called property in docs of Avro API lib, , see https://avro.apache.org/docs/current/api/java/org/apache/avro/JsonProperties.html#addProp-java.lang.String-java.lang.String- . That's what I did, isn't it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962437936


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49425/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962470305


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962470305


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963353946


   **[Test build #145000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145000/testReport)** for PR 34503 at commit [`7e6b22c`](https://github.com/apache/spark/commit/7e6b22cc0d1aa37be62f0c52c029c0ad51904df6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744385623



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"
+
   private def toSqlTypeHelper(avroSchema: Schema, existingRecordNames: Set[String]): SchemaType = {
     avroSchema.getType match {
       case INT => avroSchema.getLogicalType match {
         case _: Date => SchemaType(DateType, nullable = false)
-        case _ => SchemaType(IntegerType, nullable = false)
+        case _ =>
+          val catalystTypeAttrValue = avroSchema.getProp(CATALYST_TYPE_PROP_NAME)
+          val catalystType = if (catalystTypeAttrValue == null) {
+            IntegerType
+          } else {
+            CatalystSqlParser.parseDataType(catalystTypeAttrValue)

Review comment:
       Can we add an assert to make sure it's year-month interval? I agree to use a general property `spark.sql.catalyst.type`, which can be useful in the future. But I'm a bit afraid to return an arbitrary type here. It needs to be compatible with a physical avro INT at least.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962446744


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49425/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962491045


   @sarutak @AngersZhuuuu @Peng-Lei @gengliangwang @cloud-fan Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744142080



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -587,7 +587,8 @@ case class DataSource(
   private def disallowWritingIntervals(
       dataTypes: Seq[DataType],
       forbidAnsiIntervals: Boolean): Unit = {
-    val isWriteAllowedSource = writeAllowedSources(providingClass)
+    val isWriteAllowedSource = writeAllowedSources(providingClass) ||
+      providingClass.getCanonicalName == "org.apache.spark.sql.avro.AvroFileFormat"

Review comment:
       I cannot refer to the `AvroFileFormat` class in `external` from here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962485237


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49427/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [WIP][SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962442892


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49425/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-962514468


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744370280



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -587,7 +587,8 @@ case class DataSource(
   private def disallowWritingIntervals(
       dataTypes: Seq[DataType],
       forbidAnsiIntervals: Boolean): Unit = {
-    val isWriteAllowedSource = writeAllowedSources(providingClass)
+    val isWriteAllowedSource = writeAllowedSources(providingClass) ||
+      providingClass.getCanonicalName == "org.apache.spark.sql.avro.AvroFileFormat"

Review comment:
       I agree. will you remove this check within  this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #34503:
URL: https://github.com/apache/spark/pull/34503#discussion_r744651037



##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
##########
@@ -51,11 +52,21 @@ object SchemaConverters {
     toSqlTypeHelper(avroSchema, Set.empty)
   }
 
+  // The property specifies Catalyst type of the given field
+  private val CATALYST_TYPE_PROP_NAME = "spark.sql.catalyst.type"

Review comment:
       Shall we make it `sparkType`?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34503: [SPARK-37225][SQL] Support reading and writing ANSI intervals from/to Avro datasources

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34503:
URL: https://github.com/apache/spark/pull/34503#issuecomment-963079482


   **[Test build #145000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145000/testReport)** for PR 34503 at commit [`7e6b22c`](https://github.com/apache/spark/commit/7e6b22cc0d1aa37be62f0c52c029c0ad51904df6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org