You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/26 01:22:02 UTC

[GitHub] [spark] Peng-Lei opened a new pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Peng-Lei opened a new pull request #32340:
URL: https://github.com/apache/spark/pull/32340


   ### What changes were proposed in this pull request?
    Support YearMonthIntervalType and DayTimeIntervalType to extend ArrowColumnVector
   
   
   ### Why are the changes needed?
   https://issues.apache.org/jira/browse/SPARK-35139
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   1. By checking coding style via:
       $ ./dev/scalastyle
       $ ./dev/lint-java
   2. Run the test "ArrowWriterSuite"
      
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826834172


   **[Test build #137960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137960/testReport)** for PR 32340 at commit [`ff16a56`](https://github.com/apache/spark/commit/ff16a56c99ea7e989f4b094eb1f6d79afb8d9b3a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826443626


   **[Test build #137928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137928/testReport)** for PR 32340 at commit [`01367f2`](https://github.com/apache/spark/commit/01367f25bf8e3b03d5ac5fe7a6dc498dfdd633e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826793539






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827304365


   **[Test build #137982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137982/testReport)** for PR 32340 at commit [`b6f1dc0`](https://github.com/apache/spark/commit/b6f1dc00c7a8ae9eb5ef9133596142b6c3cd58ac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r619963111



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);
+      return months;
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      final long microseconds = intervalDayHolder.days * MICROS_PER_DAY
+                              + (long)intervalDayHolder.milliseconds * MICROS_PER_MILLIS;

Review comment:
       ```
   return Math.addExact(
     intervalDayHolder.days * MICROS_PER_DAY,
     intervalDayHolder.milliseconds * MICROS_PER_MILLIS;)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826457666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42448/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826442231






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826444460


   cc @MaxGekk @cloud-fan @BryanCutler FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r751888117



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       I'm not quite sure why `DayTimeIntervalType` map Arrow's `IntervalType` here. just according [ArrowUtils.scala#L60](https://github.com/apache/spark/blob/4ed2dab5ee46226f29919eba05fe52e446ab2449/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala#L60). I try to learn about [Arrow types](https://arrow.apache.org/docs/cpp/api/datatype.html). it's sql style. And in hive `INTERVAL_DAY_TIME` map arrow's `IntervalType` with `IntervalUnit.DAY_TIME` unit. If we map `DayTimeIntervalType` to  Arrow's `DurationType `. Then which type `YearMonthIntervalType` to match?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826443491






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r751894690



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       At the very least `Duration` cannot be mapped to `YearMonthIntervalType`. For `DayTimeIntervalType `, Arrow-wise, mapping to `IntervalType` makes sense but it makes less sense in Spark SQL because we're already mapping `Duration`.
   
   I am not saying either way is 100% correct but I would pick the one to make it coherent in Spark's perspective if I have to pick one of both.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826826429


   > @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?
   
   That's what I did.
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   3, git checkout master
   4, git pull upstream master
   5, git checkout SPARK-XXX
   6, git pull origin master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620190369



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,29 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds % MICROS_PER_DAY) / MICROS_PER_MILLIS
+    valueVector.set(count, days.toInt, millis.toInt)
+  }
+

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826832908


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137946/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826621933


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42462/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620035539



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,29 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds - days * MICROS_PER_DAY) / MICROS_PER_MILLIS

Review comment:
       resolved, Thanks very much




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620058440



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       Both multiplications can overflow too. Could you use `Math.multiplyExact`, please.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826443491


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826621933






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826622962


   **[Test build #137946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137946/testReport)** for PR 32340 at commit [`1de4cbe`](https://github.com/apache/spark/commit/1de4cbed3e2441d96779bb59c497254f70a17dc5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826793565


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42477/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826832908


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137946/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620126833



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,29 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds % MICROS_PER_DAY) / MICROS_PER_MILLIS
+    valueVector.set(count, days.toInt, millis.toInt)
+  }
+

Review comment:
       Remove the blank line.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826443626


   **[Test build #137928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137928/testReport)** for PR 32340 at commit [`01367f2`](https://github.com/apache/spark/commit/01367f25bf8e3b03d5ac5fe7a6dc498dfdd633e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620130533



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala
##########
@@ -73,6 +75,12 @@ class ArrowWriterSuite extends SparkFunSuite {
     check(DateType, Seq(0, 1, 2, null, 4))
     check(TimestampType, Seq(0L, 3.6e9.toLong, null, 8.64e10.toLong), "America/Los_Angeles")
     check(NullType, Seq(null, null, null))
+    check(YearMonthIntervalType,
+      Seq(null, 0, 1, -1, scala.Int.MaxValue, scala.Int.MinValue))

Review comment:
       ```suggestion
       check(YearMonthIntervalType, Seq(null, 0, 1, -1, scala.Int.MaxValue, scala.Int.MinValue))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827348071


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42502/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826825108


   > @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?
   
   That's what I did.
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   3, git checkout master
   4, git pull upstream master
   5, git checkout SPARK-XXX
   6, git pull origin master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826790616


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42477/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620115461



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala
##########
@@ -54,6 +54,8 @@ private[sql] object ArrowUtils {
         new ArrowType.Timestamp(TimeUnit.MICROSECOND, timeZoneId)
       }
     case NullType => ArrowType.Null.INSTANCE
+    case YearMonthIntervalType => Types.MinorType.INTERVALYEAR.getType
+    case DayTimeIntervalType => Types.MinorType.INTERVALDAY.getType

Review comment:
       Maybe the following code for consistency with the cases above:
   ```suggestion
       case YearMonthIntervalType => new ArrowType.Interval(IntervalUnit.YEAR_MONTH)
       case DayTimeIntervalType => new ArrowType.Interval(IntervalUnit.DAY_TIME)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620190123



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala
##########
@@ -73,6 +75,12 @@ class ArrowWriterSuite extends SparkFunSuite {
     check(DateType, Seq(0, 1, 2, null, 4))
     check(TimestampType, Seq(0L, 3.6e9.toLong, null, 8.64e10.toLong), "America/Los_Angeles")
     check(NullType, Seq(null, null, null))
+    check(YearMonthIntervalType,
+      Seq(null, 0, 1, -1, scala.Int.MaxValue, scala.Int.MinValue))

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620191259



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       I use "intervalDayHolder.days * MICROS_PER_DAY" instead of Math.multiplyExact




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #32340:
URL: https://github.com/apache/spark/pull/32340


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r750909587



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       Hm, there's something wrong here. We mapped Spark's `DayTimeIntervalType` to Java (Scala)'s [`java.time.Duration`](https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html) in Java but we map it here to Arrow's `IntervalType` that represents a calendar instance (see also https://github.com/apache/arrow/blob/master/format/Schema.fbs).
   
   I think we should map it to Arrow's `DurationType` (Python's [`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)). I am working on SPARK-37277 to support this in Arrow conversion at PySpark but this became a blocker to me. I am preparing a PR to change this but please let me know if you guys have different thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r750909587



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       Hm, there's something wrong here. We mapped `DayTimeIntervalType` to [`java.time.Duration`](https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html) in Java but we map it here to Arrow's `IntervalType` that represents a calendar instance (see also https://github.com/apache/arrow/blob/master/format/Schema.fbs).
   
   I think we should map it to Arrow's `DurationType` ([`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)). I am working on SPARK-37277 to support this in Arrow conversion at PySpark but this became a blocker to me. I am preparing a PR to change this but please let me know if you guys have different thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826793565


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42477/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826454802






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826794853


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r619969010



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,29 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds - days * MICROS_PER_DAY) / MICROS_PER_MILLIS

Review comment:
       nit `(totalMicroseconds % MICROS_PER_DAY) / MICROS_PER_MILLIS`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620061020



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       OK,Thanks @MaxGekk




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827052721


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137960/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826844754






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826443626






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827348071


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42502/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826616920






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826819251


   @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r750915520



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,28 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds % MICROS_PER_DAY) / MICROS_PER_MILLIS

Review comment:
       Hm, do we lose micro seconds part? I think this is another reason to use duration.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r750909587



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       Hm, there's something wrong here. We mapped `DayTimeIntervalType` to [`java.time.Duration`](https://docs.oracle.com/javase/8/docs/api/java/time/Duration.html) in Java but we map it here to `Interval` that represents a calendar instance (see also https://github.com/apache/arrow/blob/master/format/Schema.fbs).
   
   I think we should map it to `DurationType` ([`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects)). I am working on SPARK-37277 to support this in Arrow conversion at PySpark but this became a blocker to me. I am preparing a PR to change this but please let me know if you guys have different thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r751879108



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,28 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds % MICROS_PER_DAY) / MICROS_PER_MILLIS

Review comment:
       Yeah. we lose micro seconds part, end with millisecond. It's inconsistent with that convert  `java.time.Duration` to `DayTimeIntervalType` that drop any excess presision that greater than microsecond precision.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620190533



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala
##########
@@ -54,6 +54,8 @@ private[sql] object ArrowUtils {
         new ArrowType.Timestamp(TimeUnit.MICROSECOND, timeZoneId)
       }
     case NullType => ArrowType.Null.INSTANCE
+    case YearMonthIntervalType => Types.MinorType.INTERVALYEAR.getType
+    case DayTimeIntervalType => Types.MinorType.INTERVALDAY.getType

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620070926



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       @MaxGekk done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620035435



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);
+      return months;
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      final long microseconds = intervalDayHolder.days * MICROS_PER_DAY
+                              + (long)intervalDayHolder.milliseconds * MICROS_PER_MILLIS;

Review comment:
       resolved,Thanks very much




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827339198


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620260738



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       Yeah,I test it. It did overflow. Thank you very much.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826848740


   @Peng-Lei You can skip a few steps I think:
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   ...
   7, git push origin SPARK-XXX
   
   You can merge/rebase on the master only if you see conflicts in the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620102500



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       I think so:
   ```scala
   scala> Long.MaxValue / MICROS_PER_DAY
   res0: Long = 106751991
   
   scala> (106751991 + 1) * MICROS_PER_DAY
   res1: Long = -9223371964909551616
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826966771


   **[Test build #137957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137957/testReport)** for PR 32340 at commit [`a1212d0`](https://github.com/apache/spark/commit/a1212d02e9314a46ffa4e8e10b69d1fd68db6eac).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826457666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42448/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620279496



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       OK, I'll try to add it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826580726


   **[Test build #137940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137940/testReport)** for PR 32340 at commit [`a70e730`](https://github.com/apache/spark/commit/a70e7304f1008548993c3f07df5df63de0fb85be).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826580726


   **[Test build #137940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137940/testReport)** for PR 32340 at commit [`a70e730`](https://github.com/apache/spark/commit/a70e7304f1008548993c3f07df5df63de0fb85be).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826529725


   **[Test build #137928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137928/testReport)** for PR 32340 at commit [`01367f2`](https://github.com/apache/spark/commit/01367f25bf8e3b03d5ac5fe7a6dc498dfdd633e7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r751227247



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       good catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r751894690



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       At the very least `Duration` cannot be mapped to `YearMonthIntervalType`. Arrow-wise, mapping to `IntervalType` makes sense but it makes less sense in Spark SQL because we're already mapping different types.
   
   I am not saying either way is 100% correct but I would pick the one to make it coherent in Spark's perspective if I have to pick one of both.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827449065


   **[Test build #137982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137982/testReport)** for PR 32340 at commit [`b6f1dc0`](https://github.com/apache/spark/commit/b6f1dc00c7a8ae9eb5ef9133596142b6c3cd58ac).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620189928



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala
##########
@@ -73,6 +75,12 @@ class ArrowWriterSuite extends SparkFunSuite {
     check(DateType, Seq(0, 1, 2, null, 4))
     check(TimestampType, Seq(0L, 3.6e9.toLong, null, 8.64e10.toLong), "America/Los_Angeles")
     check(NullType, Seq(null, null, null))
+    check(YearMonthIntervalType,
+      Seq(null, 0, 1, -1, scala.Int.MaxValue, scala.Int.MinValue))
+    check(DayTimeIntervalType,
+      Seq(null, 0L, 1000L, -1000L,
+        (scala.Long.MaxValue - 807L),
+        (scala.Long.MinValue + 808L)))

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826794853


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620237905



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       @Peng-Lei  Wenchen asked about milliseconds part, why did you changed the days multiplication? I showed above that it can overflow Long.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620832720



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620268633



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       Such negative test could be useful, can you add it to the PR? So, we could catch the behavior change if someone will change your code in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826442231






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r619962492



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);

Review comment:
       nit: `return accessor.get(rowId);`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620035312



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);

Review comment:
       resolved

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);
+      return months;
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      final long microseconds = intervalDayHolder.days * MICROS_PER_DAY
+                              + (long)intervalDayHolder.milliseconds * MICROS_PER_MILLIS;

Review comment:
       resolved

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,29 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds - days * MICROS_PER_DAY) / MICROS_PER_MILLIS

Review comment:
       resolved




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620257928



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       `days` has the `Int` type, `MICROS_PER_DAY` is Long but `intervalDayHolder.days * MICROS_PER_DAY` can overflow Long in general case. @Peng-Lei If you believe the overflow never happens, could proof that or/and add an assert.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827328032


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42502/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826824507


   **[Test build #137946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137946/testReport)** for PR 32340 at commit [`1de4cbe`](https://github.com/apache/spark/commit/1de4cbed3e2441d96779bb59c497254f70a17dc5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827461266


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137982/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620157166



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       Right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826758663


   **[Test build #137957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137957/testReport)** for PR 32340 at commit [`a1212d0`](https://github.com/apache/spark/commit/a1212d02e9314a46ffa4e8e10b69d1fd68db6eac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827052721


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137960/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620279886



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+                           Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       Sorry, I didn't clarify well
   ```suggestion
         return Math.addExact(
           Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
           intervalDayHolder.milliseconds * MICROS_PER_MILLIS);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826826429


   > @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?
   
   That's what I did.
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   3, git checkout master
   4, git pull upstream master
   5, git checkout SPARK-XXX
   6, git pull origin master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827461266


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137982/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826442231


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826442231


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826976082


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826443626


   **[Test build #137928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137928/testReport)** for PR 32340 at commit [`01367f2`](https://github.com/apache/spark/commit/01367f25bf8e3b03d5ac5fe7a6dc498dfdd633e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620268633



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,37 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(intervalDayHolder.days * MICROS_PER_DAY,

Review comment:
       Such negative test could be useful, can you add it to the PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r619962492



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);

Review comment:
       nit: `return accessor.get(rowId);`

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);
+      return months;
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      final long microseconds = intervalDayHolder.days * MICROS_PER_DAY
+                              + (long)intervalDayHolder.milliseconds * MICROS_PER_MILLIS;

Review comment:
       should we handle overflow?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);
+      return months;
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      final long microseconds = intervalDayHolder.days * MICROS_PER_DAY
+                              + (long)intervalDayHolder.milliseconds * MICROS_PER_MILLIS;

Review comment:
       ```
   return Math.addExact(
     intervalDayHolder.days * MICROS_PER_DAY,
     intervalDayHolder.milliseconds * MICROS_PER_MILLIS;)
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala
##########
@@ -394,3 +397,29 @@ private[arrow] class NullWriter(val valueVector: NullVector) extends ArrowFieldW
   override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
   }
 }
+
+private[arrow] class IntervalYearWriter(val valueVector: IntervalYearVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    valueVector.setSafe(count, input.getInt(ordinal));
+  }
+}
+
+private[arrow] class IntervalDayWriter(val valueVector: IntervalDayVector)
+  extends ArrowFieldWriter {
+  override def setNull(): Unit = {
+    valueVector.setNull(count)
+  }
+
+  override def setValue(input: SpecializedGetters, ordinal: Int): Unit = {
+    val totalMicroseconds = input.getLong(ordinal)
+    val days = totalMicroseconds / MICROS_PER_DAY
+    val millis = (totalMicroseconds - days * MICROS_PER_DAY) / MICROS_PER_MILLIS

Review comment:
       nit `(totalMicroseconds % MICROS_PER_DAY) / MICROS_PER_MILLIS`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826668677


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42468/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826825108


   > @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?
   
   That's what I did.
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   3, git checkout master
   4, git pull upstream master
   5, git checkout SPARK-XXX
   6, git pull origin master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826834172


   **[Test build #137960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137960/testReport)** for PR 32340 at commit [`ff16a56`](https://github.com/apache/spark/commit/ff16a56c99ea7e989f4b094eb1f6d79afb8d9b3a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620140813



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       but Int.Max * MICROS_PER_MILLIS won't overflow, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826844818






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei edited a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei edited a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826828182


   > @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?
   
   @MaxGekk That's what I did.
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   3, git checkout master
   4, git pull upstream master
   5, git checkout SPARK-XXX
   6, git pull origin master
   7, git push origin SPARK-XXX


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827304365


   **[Test build #137982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137982/testReport)** for PR 32340 at commit [`b6f1dc0`](https://github.com/apache/spark/commit/b6f1dc00c7a8ae9eb5ef9133596142b6c3cd58ac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620088573



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       @MaxGekk  `intervalDayHolder.days` and `intervalDayHolder.milliseconds` are int, can they really overflow?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826844818


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42480/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r619962778



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,39 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      int months = accessor.get(rowId);
+      return months;
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      final long microseconds = intervalDayHolder.days * MICROS_PER_DAY
+                              + (long)intervalDayHolder.milliseconds * MICROS_PER_MILLIS;

Review comment:
       should we handle overflow?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827028115


   **[Test build #137960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137960/testReport)** for PR 32340 at commit [`ff16a56`](https://github.com/apache/spark/commit/ff16a56c99ea7e989f4b094eb1f6d79afb8d9b3a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826666250






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826541191


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137928/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Yikun commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827363074


   Looks like we can remove the note decription on the ArrowColumnVector [1]:`Currently calendar interval type and map type are not supported.`, we can do it in a followup patch.
   
   [1] https://github.com/apache/spark/pull/32340/files#diff-94174f963b367bc222d41c4ef9ed34563dafbd6243f5e3c273b04bc8e4979e57L29


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826622962


   **[Test build #137946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137946/testReport)** for PR 32340 at commit [`1de4cbe`](https://github.com/apache/spark/commit/1de4cbed3e2441d96779bb59c497254f70a17dc5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32340: [SPARK-35139][SQL]Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826541191


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137928/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826828182


   > @Peng-Lei I wonder why do you make changes in your fork master and merge them to `SPARK-35139` instead of direct changes in `Peng-Lei:SPARK-35139`?
   
   @MaxGekk That's what I did.
   1, git branch SPARK-XXX
   2, develop...
   3, git add and commit
   3, git checkout master
   4, git pull upstream master
   5, git checkout SPARK-XXX
   6, git pull origin master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-827330598


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42502/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Peng-Lei commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
Peng-Lei commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620253524



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -508,4 +516,38 @@ final ColumnarMap getMap(int rowId) {
       super(vector);
     }
   }
+
+  private static class IntervalYearAccessor extends ArrowVectorAccessor {
+
+    private final IntervalYearVector accessor;
+
+    IntervalYearAccessor(IntervalYearVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    int getInt(int rowId) {
+      return accessor.get(rowId);
+    }
+  }
+
+  private static class IntervalDayAccessor extends ArrowVectorAccessor {
+
+    private final IntervalDayVector accessor;
+    private final NullableIntervalDayHolder intervalDayHolder = new NullableIntervalDayHolder();
+
+    IntervalDayAccessor(IntervalDayVector vector) {
+      super(vector);
+      this.accessor = vector;
+    }
+
+    @Override
+    long getLong(int rowId) {
+      accessor.get(rowId, intervalDayHolder);
+      return Math.addExact(
+              Math.multiplyExact(intervalDayHolder.days, MICROS_PER_DAY),
+              Math.multiplyExact(intervalDayHolder.milliseconds, MICROS_PER_MILLIS));

Review comment:
       Sorry,My mistake. I get it. I'll fix it right now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620130533



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala
##########
@@ -73,6 +75,12 @@ class ArrowWriterSuite extends SparkFunSuite {
     check(DateType, Seq(0, 1, 2, null, 4))
     check(TimestampType, Seq(0L, 3.6e9.toLong, null, 8.64e10.toLong), "America/Los_Angeles")
     check(NullType, Seq(null, null, null))
+    check(YearMonthIntervalType,
+      Seq(null, 0, 1, -1, scala.Int.MaxValue, scala.Int.MinValue))

Review comment:
       Use `Int.MaxValue`
   ```suggestion
       check(YearMonthIntervalType, Seq(null, 0, 1, -1, Int.MaxValue, Int.MinValue))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32340:
URL: https://github.com/apache/spark/pull/32340#issuecomment-826758663


   **[Test build #137957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137957/testReport)** for PR 32340 at commit [`a1212d0`](https://github.com/apache/spark/commit/a1212d02e9314a46ffa4e8e10b69d1fd68db6eac).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r620132622



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowWriterSuite.scala
##########
@@ -73,6 +75,12 @@ class ArrowWriterSuite extends SparkFunSuite {
     check(DateType, Seq(0, 1, 2, null, 4))
     check(TimestampType, Seq(0L, 3.6e9.toLong, null, 8.64e10.toLong), "America/Los_Angeles")
     check(NullType, Seq(null, null, null))
+    check(YearMonthIntervalType,
+      Seq(null, 0, 1, -1, scala.Int.MaxValue, scala.Int.MinValue))
+    check(DayTimeIntervalType,
+      Seq(null, 0L, 1000L, -1000L,
+        (scala.Long.MaxValue - 807L),
+        (scala.Long.MinValue + 808L)))

Review comment:
       nit: `scala.Long.MinValue` -> `Long.MinValue`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32340: [SPARK-35139][SQL] Support ANSI intervals as Arrow Column vectors

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32340:
URL: https://github.com/apache/spark/pull/32340#discussion_r751925908



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java
##########
@@ -172,6 +176,10 @@ public ArrowColumnVector(ValueVector vector) {
       }
     } else if (vector instanceof NullVector) {
       accessor = new NullAccessor((NullVector) vector);
+    } else if (vector instanceof IntervalYearVector) {
+      accessor = new IntervalYearAccessor((IntervalYearVector) vector);
+    } else if (vector instanceof IntervalDayVector) {
+      accessor = new IntervalDayAccessor((IntervalDayVector) vector);

Review comment:
       and, `YearMonthIntervalType` is mapped to [`java.time.Period`](https://docs.oracle.com/javase/8/docs/api/java/time/Period.html) which is a calendar instance:
   
   > A date-based amount of time in the ISO-8601 calendar system, such as '2 years, 3 months and 4 days'.
   
   So `YearMonthIntervalType` seems fine.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org