You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by TomaszGaweda <gi...@git.apache.org> on 2018/08/27 20:08:51 UTC
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
GitHub user TomaszGaweda opened a pull request:
https://github.com/apache/spark/pull/22249
[SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions.scala
## What changes were proposed in this pull request?
Add `parse_url` function to `functions.scala`. This will allow users to use this functions without calling `selectExpr` or `spark.sql`
## How was this patch tested?
`testUrl` function was changed to test also this change
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/TomaszGaweda/spark parseUrl
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22249.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22249
----
commit 4002cb9c48b608123845b89af9f77e137626f83f
Author: Tomasz Gaweda <to...@...>
Date: 2018-08-27T20:05:50Z
SPARK-16281 Follow-up: parse_url in functions.scala
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda closed the pull request at:
https://github.com/apache/spark/pull/22249
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213120096
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
@TomaszGaweda This sounds a good idea by returning a handler for built-in functions. cc @rxin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213110408
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
This PR is created after this StackOverflow question: https://stackoverflow.com/questions/52041342/how-to-parse-url-in-spark-sqlscala/52043771
I'm not sure how often it is used, however most of functions are available in functions object to make Scala and SQL interfaces similar. If you think it's useless - please let me know, I'll just close this PR
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22249
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213232338
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
@HyukjinKwon I see now. Yeah, wrapping in the `Column` will be necessary, at least no string concatenation will be required
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22249
I remember that @rxin was against adding much of these functions here through the various programmatic APIs: only the most used ones should have been exposed and I am not sure this is a frequently used one. So I am not sure about this PR.
Anyway, I think we should add to the Python API too if we decide to add this at all.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213227841
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
I mean,
> I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url")
Can this support strongly typed one?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:
https://github.com/apache/spark/pull/22249
@gatorsmile @cloud-fan @HyukjinKwon @mgaido91 Could you please review this PR and start tests?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213121794
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
I like this idea too
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213213173
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
@HyukjinKwon Thanks for the suggestion, however now users are complaining about stringly-typed system in Spark, there are libs like Frameless from Typelevel to archieve a bit more type safety. `expr` is springly-typed, while functions in `functions` object or accessed via `UserDefinedFunction` are a bit more type safe.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22249
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213126158
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
Ok, tomorrow I will create a Jira and start working on it. Thanks for your comments! :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213176616
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
Can't we just use `expr` instead?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213112726
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
In long term, I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url"). Now `spark.udf.register` returns a handler for UDF, something similar for getting any kind of registered function may be helpful. However, it's a lot more work, so now I've just proposed to add this function :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22249
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22249#discussion_r213109101
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
StringTrimLeft(e.expr, Literal(trimString))
}
+ /**
+ * Extracts a part from a URL.
+ *
+ * @group string_funcs
+ * @since 2.4.0
+ */
+ def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --
When adding the functions into `object functions`, we need to check how useful they are.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org