You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/04/09 16:22:05 UTC
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/21011
[SPARK-23916][SQL] Add array_join function
## What changes were proposed in this pull request?
The PR adds the SQL function `array_join`. The behavior of the function is based on Presto's one.
The function accepts an `array` of `string` which is to be joined, a `string` which is the delimiter to use between the items of the first argument and optionally a `string` which is used to replace `null` values.
## How was this patch tested?
added UTs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-23916
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21011.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21011
----
commit 1408cfcd8f0ad2a571d29b57d71128584ea4b4f0
Author: Marco Gaido <ma...@...>
Date: 2018-04-09T16:16:43Z
[SPARK-23916][SQL] Add array_join function
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181380233
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
--- End diff --
Hmm, I think the indent is 2 spaces in this case. For example, [namedExpressions.scala#L170-L174](https://github.com/mgaido91/spark/blob/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L170-L174) or [regexpExpressions.scala#L46-L51](https://github.com/mgaido91/spark/blob/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L46-L51).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2538/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21011
kindly ping @ueshin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89094/testReport)** for PR 21011 at commit [`e52ff85`](https://github.com/apache/spark/commit/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89534/testReport)** for PR 21011 at commit [`b1597d7`](https://github.com/apache/spark/commit/b1597d791b0302a4541cf545956ecf0f3454f056).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89342/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21011
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181371786
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
--- End diff --
nit: indent?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2456/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181376485
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
--- End diff --
I don't think the indent is wrong since this is for the if...else and not for the method itself
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89451/testReport)** for PR 21011 at commit [`703c09c`](https://github.com/apache/spark/commit/703c09c4e9da2b96c7a5f445fd5a1d30cdc29c03).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89534/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2137/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181371816
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
--- End diff --
nit: indent?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89451/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181380011
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
+
+ override def nullable: Boolean = children.exists(_.nullable)
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def eval(input: InternalRow): Any = {
+ val arrayEval = array.eval(input)
+ if (arrayEval == null) return null
+ val delimiterEval = delimiter.eval(input)
+ if (delimiterEval == null) return null
+ val nullReplacementEval = nullReplacement.map(_.eval(input))
+ if (nullReplacementEval.contains(null)) return null
+
--- End diff --
I removed the other one.... :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2104/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2128/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21011
any more comments @ueshin ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21011
cc @ueshin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89351/testReport)** for PR 21011 at commit [`ad0d4aa`](https://github.com/apache/spark/commit/ad0d4aa5d671b3a99fa1bd30dc833a8b75444f6c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r180311653
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """)
--- End diff --
and `since`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89645/testReport)** for PR 21011 at commit [`e9d7baa`](https://github.com/apache/spark/commit/e9d7baa09ecee2456d48bf7de71330714dba4c4c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class Summarizer(object):`
* `class SummaryBuilder(JavaWrapper):`
* `case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes `
* `case class ArrayPosition(left: Expression, right: Expression)`
* `case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil `
* `case class Concat(children: Seq[Expression]) extends Expression `
* `abstract class GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes `
* `case class GetMapValue(child: Expression, key: Expression)`
* `class ArrayDataIndexedSeq[T](arrayData: ArrayData, dataType: DataType) extends IndexedSeq[T] `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21011
Thanks! merging to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181371299
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
+
+ override def nullable: Boolean = children.exists(_.nullable)
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def eval(input: InternalRow): Any = {
+ val arrayEval = array.eval(input)
+ if (arrayEval == null) return null
+ val delimiterEval = delimiter.eval(input)
+ if (delimiterEval == null) return null
+ val nullReplacementEval = nullReplacement.map(_.eval(input))
+ if (nullReplacementEval.contains(null)) return null
+
+
+ val buffer = new UTF8StringBuilder()
+ var firstItem = true
+ val nullHandling = nullReplacementEval match {
+ case Some(rep) => (prependDelimiter: Boolean) => {
+ if (!prependDelimiter) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(rep.asInstanceOf[UTF8String])
+ true
+ }
+ case None => (_: Boolean) => false
+ }
+ arrayEval.asInstanceOf[ArrayData].foreach(StringType, (_, item) => {
+ if (item == null) {
+ if (nullHandling(firstItem)) {
+ firstItem = false
+ }
+ } else {
+ if (!firstItem) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(item.asInstanceOf[UTF8String])
+ firstItem = false
+ }
+ })
+ buffer.build()
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val code = nullReplacement match {
+ case Some(replacement) =>
+ val replacementGen = replacement.genCode(ctx)
+ val nullHandling = (buffer: String, delimiter: String, firstItem: String) => {
+ s"""
+ |if (!$firstItem) {
+ | $buffer.append($delimiter);
+ |}
+ |$buffer.append(${replacementGen.value});
+ |$firstItem = false;
+ """.stripMargin
+ }
+ val execCode = if (replacement.nullable) {
+ ctx.nullSafeExec(replacement.nullable, replacementGen.isNull) {
+ genCodeForArrayAndDelimiter(ctx, ev, nullHandling)
+ }
+ } else {
+ genCodeForArrayAndDelimiter(ctx, ev, nullHandling)
+ }
+ s"""
+ |${replacementGen.code}
+ |$execCode
+ """.stripMargin
+ case None => genCodeForArrayAndDelimiter(ctx, ev,
+ (_: String, _: String, _: String) => "// nulls are ignored")
+ }
+ if (nullable) {
+ ev.copy(
+ s"""
+ |boolean ${ev.isNull} = true;
+ |UTF8String ${ev.value} = null;
+ |$code
+ """.stripMargin)
+ } else {
+ ev.copy(s"""
--- End diff --
nit: maybe we need a line break between `copy(` and `s"""`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2322/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2390/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r180311581
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -413,6 +413,29 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
)
}
+ test("array_join function") {
+ val df = Seq(
+ (Seq[String]("a", "b"), ","),
+ (Seq[String]("a", null, "b"), ","),
+ (Seq[String](), ",")
--- End diff --
Maybe `Seq.empty[String]`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89534/testReport)** for PR 21011 at commit [`b1597d7`](https://github.com/apache/spark/commit/b1597d791b0302a4541cf545956ecf0f3454f056).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181371352
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
+
+ override def nullable: Boolean = children.exists(_.nullable)
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def eval(input: InternalRow): Any = {
+ val arrayEval = array.eval(input)
+ if (arrayEval == null) return null
+ val delimiterEval = delimiter.eval(input)
+ if (delimiterEval == null) return null
+ val nullReplacementEval = nullReplacement.map(_.eval(input))
+ if (nullReplacementEval.contains(null)) return null
+
+
+ val buffer = new UTF8StringBuilder()
+ var firstItem = true
+ val nullHandling = nullReplacementEval match {
+ case Some(rep) => (prependDelimiter: Boolean) => {
+ if (!prependDelimiter) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(rep.asInstanceOf[UTF8String])
+ true
+ }
+ case None => (_: Boolean) => false
+ }
+ arrayEval.asInstanceOf[ArrayData].foreach(StringType, (_, item) => {
+ if (item == null) {
+ if (nullHandling(firstItem)) {
+ firstItem = false
+ }
+ } else {
+ if (!firstItem) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(item.asInstanceOf[UTF8String])
+ firstItem = false
+ }
+ })
+ buffer.build()
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val code = nullReplacement match {
+ case Some(replacement) =>
+ val replacementGen = replacement.genCode(ctx)
+ val nullHandling = (buffer: String, delimiter: String, firstItem: String) => {
+ s"""
+ |if (!$firstItem) {
+ | $buffer.append($delimiter);
+ |}
+ |$buffer.append(${replacementGen.value});
+ |$firstItem = false;
+ """.stripMargin
+ }
+ val execCode = if (replacement.nullable) {
+ ctx.nullSafeExec(replacement.nullable, replacementGen.isNull) {
+ genCodeForArrayAndDelimiter(ctx, ev, nullHandling)
+ }
+ } else {
+ genCodeForArrayAndDelimiter(ctx, ev, nullHandling)
+ }
+ s"""
+ |${replacementGen.code}
+ |$execCode
+ """.stripMargin
--- End diff --
nit: indent
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181371571
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
+
+ override def nullable: Boolean = children.exists(_.nullable)
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def eval(input: InternalRow): Any = {
+ val arrayEval = array.eval(input)
+ if (arrayEval == null) return null
+ val delimiterEval = delimiter.eval(input)
+ if (delimiterEval == null) return null
+ val nullReplacementEval = nullReplacement.map(_.eval(input))
+ if (nullReplacementEval.contains(null)) return null
+
--- End diff --
nit: remove an extra line.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89340/testReport)** for PR 21011 at commit [`dd9482e`](https://github.com/apache/spark/commit/dd9482e230c9efdab66609639d633a207e47f736).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181381594
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
--- End diff --
ops, you are right....I am not sure where I saw it differently...maybe I just got confused...sorry, I am fixing it
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89067/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89342/testReport)** for PR 21011 at commit [`ad0d4aa`](https://github.com/apache/spark/commit/ad0d4aa5d671b3a99fa1bd30dc833a8b75444f6c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89645/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89340/testReport)** for PR 21011 at commit [`dd9482e`](https://github.com/apache/spark/commit/dd9482e230c9efdab66609639d633a207e47f736).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181376517
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21011
any more comments?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2317/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2319/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89094/testReport)** for PR 21011 at commit [`e52ff85`](https://github.com/apache/spark/commit/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89106/testReport)** for PR 21011 at commit [`e52ff85`](https://github.com/apache/spark/commit/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r182333814
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,27 @@ def array_contains(col, value):
return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
+@ignore_unicode_prefix
+@since(2.4)
+def array_join(col, delimiter, null_replacement=None):
+ """
+ Concatenates the elements of `column` using the `delimiter`. Null values are replaced with
+ `nullReplacement` if set, otherwise they are ignored.
--- End diff --
nit: `null_replacement`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89094/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89342/testReport)** for PR 21011 at commit [`ad0d4aa`](https://github.com/apache/spark/commit/ad0d4aa5d671b3a99fa1bd30dc833a8b75444f6c).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r180317060
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """)
--- End diff --
add `since`. see [this discussion](https://github.com/apache/spark/pull/21021#discussion_r180309744).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89106/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89351/testReport)** for PR 21011 at commit [`ad0d4aa`](https://github.com/apache/spark/commit/ad0d4aa5d671b3a99fa1bd30dc833a8b75444f6c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181371114
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
+
+ override def nullable: Boolean = children.exists(_.nullable)
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def eval(input: InternalRow): Any = {
+ val arrayEval = array.eval(input)
+ if (arrayEval == null) return null
+ val delimiterEval = delimiter.eval(input)
+ if (delimiterEval == null) return null
+ val nullReplacementEval = nullReplacement.map(_.eval(input))
+ if (nullReplacementEval.contains(null)) return null
+
+
+ val buffer = new UTF8StringBuilder()
+ var firstItem = true
+ val nullHandling = nullReplacementEval match {
+ case Some(rep) => (prependDelimiter: Boolean) => {
+ if (!prependDelimiter) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(rep.asInstanceOf[UTF8String])
+ true
+ }
+ case None => (_: Boolean) => false
+ }
+ arrayEval.asInstanceOf[ArrayData].foreach(StringType, (_, item) => {
+ if (item == null) {
+ if (nullHandling(firstItem)) {
+ firstItem = false
+ }
+ } else {
+ if (!firstItem) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(item.asInstanceOf[UTF8String])
+ firstItem = false
+ }
+ })
+ buffer.build()
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val code = nullReplacement match {
+ case Some(replacement) =>
+ val replacementGen = replacement.genCode(ctx)
+ val nullHandling = (buffer: String, delimiter: String, firstItem: String) => {
+ s"""
+ |if (!$firstItem) {
+ | $buffer.append($delimiter);
+ |}
+ |$buffer.append(${replacementGen.value});
+ |$firstItem = false;
+ """.stripMargin
--- End diff --
nit: indent
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21011
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89645/testReport)** for PR 21011 at commit [`e9d7baa`](https://github.com/apache/spark/commit/e9d7baa09ecee2456d48bf7de71330714dba4c4c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89351/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89340/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89451/testReport)** for PR 21011 at commit [`703c09c`](https://github.com/apache/spark/commit/703c09c4e9da2b96c7a5f445fd5a1d30cdc29c03).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21011
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89067/testReport)** for PR 21011 at commit [`1408cfc`](https://github.com/apache/spark/commit/1408cfcd8f0ad2a571d29b57d71128584ea4b4f0).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class ArrayJoin(`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89106/testReport)** for PR 21011 at commit [`e52ff85`](https://github.com/apache/spark/commit/e52ff856d42adc5af2e2b2593c2e63d5c3f3a205).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21011: [SPARK-23916][SQL] Add array_join function
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21011#discussion_r181363454
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ---
@@ -287,3 +288,173 @@ case class ArrayContains(left: Expression, right: Expression)
override def prettyName: String = "array_contains"
}
+
+/**
+ * Creates a String containing all the elements of the input array separated by the delimiter.
+ */
+@ExpressionDescription(
+ usage = """
+ _FUNC_(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array
+ using the delimiter and an optional string to replace nulls. If no value is set for
+ nullReplacement, any null value is filtered.""",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(array('hello', 'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ');
+ hello world
+ > SELECT _FUNC_(array('hello', null ,'world'), ' ', ',');
+ hello , world
+ """, since = "2.4.0")
+case class ArrayJoin(
+ array: Expression,
+ delimiter: Expression,
+ nullReplacement: Option[Expression]) extends Expression with ExpectsInputTypes {
+
+ def this(array: Expression, delimiter: Expression) = this(array, delimiter, None)
+
+ def this(array: Expression, delimiter: Expression, nullReplacement: Expression) =
+ this(array, delimiter, Some(nullReplacement))
+
+ override def inputTypes: Seq[AbstractDataType] = if (nullReplacement.isDefined) {
+ Seq(ArrayType(StringType), StringType, StringType)
+ } else {
+ Seq(ArrayType(StringType), StringType)
+ }
+
+ override def children: Seq[Expression] = if (nullReplacement.isDefined) {
+ Seq(array, delimiter, nullReplacement.get)
+ } else {
+ Seq(array, delimiter)
+ }
+
+ override def nullable: Boolean = children.exists(_.nullable)
+
+ override def foldable: Boolean = children.forall(_.foldable)
+
+ override def eval(input: InternalRow): Any = {
+ val arrayEval = array.eval(input)
+ if (arrayEval == null) return null
+ val delimiterEval = delimiter.eval(input)
+ if (delimiterEval == null) return null
+ val nullReplacementEval = nullReplacement.map(_.eval(input))
+ if (nullReplacementEval.contains(null)) return null
+
+
+ val buffer = new UTF8StringBuilder()
+ var firstItem = true
+ val nullHandling = nullReplacementEval match {
+ case Some(rep) => (prependDelimiter: Boolean) => {
+ if (!prependDelimiter) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(rep.asInstanceOf[UTF8String])
+ true
+ }
+ case None => (_: Boolean) => false
+ }
+ arrayEval.asInstanceOf[ArrayData].foreach(StringType, (_, item) => {
+ if (item == null) {
+ if (nullHandling(firstItem)) {
+ firstItem = false
+ }
+ } else {
+ if (!firstItem) {
+ buffer.append(delimiterEval.asInstanceOf[UTF8String])
+ }
+ buffer.append(item.asInstanceOf[UTF8String])
+ firstItem = false
+ }
+ })
+ buffer.build()
+ }
+
+ override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+ val code = nullReplacement match {
+ case Some(replacement) =>
+ val replacementGen = replacement.genCode(ctx)
+ val nullHandling = (buffer: String, delimiter: String, firstItem: String) => {
+ s"""
+ |if (!$firstItem) {
+ | $buffer.append($delimiter);
+ |}
+ |$buffer.append(${replacementGen.value});
+ |$firstItem = false;
+ """.stripMargin
+ }
+ val execCode = if (replacement.nullable) {
+ ctx.nullSafeExec(replacement.nullable, replacementGen.isNull) {
+ genCodeForArrayAndDelimiter(ctx, ev, nullHandling)
+ }
+ } else {
+ genCodeForArrayAndDelimiter(ctx, ev, nullHandling)
+ }
+ s"""
+ |${replacementGen.code}
+ |$execCode
+ """.stripMargin
+ case None => genCodeForArrayAndDelimiter(ctx, ev,
+ (_: String, _: String, _: String) => "// nulls are ignored")
+ }
+ if (nullable) {
+ ev.copy(
+ s"""
+ |boolean ${ev.isNull} = true;
+ |UTF8String ${ev.value} = null;
+ |$code
+ """.stripMargin)
+ } else {
+ ev.copy(s"""
+ |boolean ${ev.isNull} = false;
--- End diff --
nit: I guess we can remove this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21011: [SPARK-23916][SQL] Add array_join function
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21011
**[Test build #89067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89067/testReport)** for PR 21011 at commit [`1408cfc`](https://github.com/apache/spark/commit/1408cfcd8f0ad2a571d29b57d71128584ea4b4f0).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org