You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by TomaszGaweda <gi...@git.apache.org> on 2018/08/27 20:08:51 UTC

[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

GitHub user TomaszGaweda opened a pull request:

    https://github.com/apache/spark/pull/22249

    [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions.scala

    ## What changes were proposed in this pull request?
    
    Add `parse_url` function to `functions.scala`. This will allow users to use this functions without calling `selectExpr` or `spark.sql`
    
    ## How was this patch tested?
    
    `testUrl` function was changed to test also this change
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/TomaszGaweda/spark parseUrl

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22249
    
----
commit 4002cb9c48b608123845b89af9f77e137626f83f
Author: Tomasz Gaweda <to...@...>
Date:   2018-08-27T20:05:50Z

    SPARK-16281 Follow-up: parse_url in functions.scala

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda closed the pull request at:

    https://github.com/apache/spark/pull/22249


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213120096
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    @TomaszGaweda This sounds a good idea by returning a handler for built-in functions. cc @rxin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213110408
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    This PR is created after this StackOverflow question: https://stackoverflow.com/questions/52041342/how-to-parse-url-in-spark-sqlscala/52043771
    
    I'm not sure how often it is used, however most of functions are available in functions object to make Scala and SQL interfaces similar. If you think it's useless - please let me know, I'll just close this PR


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22249
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213232338
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    @HyukjinKwon I see now. Yeah, wrapping in  the `Column` will be necessary, at least no string concatenation will be required


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22249
  
    I remember that @rxin was against adding much of these functions here through the various programmatic APIs: only the most used ones should have been exposed and I am not sure this is a frequently used one. So I am not sure about this PR.
    
    Anyway, I think we should add to the Python API too if we decide to add this at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213227841
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    I mean, 
    
    > I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url")
    
    Can this support strongly typed one?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on the issue:

    https://github.com/apache/spark/pull/22249
  
    @gatorsmile @cloud-fan @HyukjinKwon @mgaido91 Could you please review this PR and start tests? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213121794
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    I like this idea too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213213173
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    @HyukjinKwon Thanks for the suggestion, however now users are complaining about stringly-typed system in Spark, there are libs like Frameless from Typelevel to archieve a bit more type safety. `expr` is springly-typed, while functions in `functions` object or accessed via `UserDefinedFunction` are a bit more type safe.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22249
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213126158
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    Ok, tomorrow I will create a Jira and start working on it. Thanks for your comments! :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213176616
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    Can't we just use `expr` instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213112726
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    In long term, I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url"). Now `spark.udf.register` returns a handler for UDF, something similar for getting any kind of registered function may be helpful. However, it's a lot more work, so now I've just proposed to add this function :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22249
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22249#discussion_r213109101
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2459,6 +2459,26 @@ object functions {
         StringTrimLeft(e.expr, Literal(trimString))
       }
     
    +  /**
    +    * Extracts a part from a URL.
    +    *
    +    * @group string_funcs
    +    * @since 2.4.0
    +    */
    +  def parse_url(url: Column, partToExtract: String): Column = withExpr {
    --- End diff --
    
    When adding the functions into `object functions`, we need to check how useful they are. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org