You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sarutak <gi...@git.apache.org> on 2016/05/05 22:31:00 UTC

[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

GitHub user sarutak opened a pull request:

    https://github.com/apache/spark/pull/12939

    [SPARK-15165][SQL] Codegen can break because toCommentSafeString is not actually safe

    ## What changes were proposed in this pull request?
    
    toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
    But if the even number of "\" is put before "u", like "\\u", in the string literal in the query, codegen can break.
    
    Following code occurs compilation error.
    
    ```
    val df = Seq(...).toDF
    df.select("'\\\\\\\\u002A/'").show
    ```
    
    The reason of the compilation error is because "\\\\\\\\u002A/" is translated into "*/" (the end of comment). 
    
    Due to this unsafety, arbitrary code can be injected like as follows.
    
    ```
    val df = Seq(...).toDF
    // Inject "System.exit(1)"
    df.select("'\\\\\\\\u002A/{System.exit(1);}/*'").show
    ```
    
    
    ## How was this patch tested?
    
    Added new test cases.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sarutak/spark SPARK-15165

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12939
    
----
commit 30ad0819103d35b612c52000bcdcd0b5daa798e1
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2016-05-05T22:04:08Z

    Made toCommentSafeString method safer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218122747
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58228/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12939


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219904534
  
    @davies @sarutak I'm wondering if we should go with a whitelist approach, i.e. only whitelisting a-z0-9 and () []. It wouldn't sacrifice readability as much, but would be a lot safer. WDYT?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219669982
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-220093861
  
    Yeah, we could repurpose #12979 for security reason, will review that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219908934
  
    Just remove the character.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219669985
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58675/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219945292
  
    hm that's true. @davies want to review that one?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219669729
  
    **[Test build #58675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58675/consoleFull)** for PR 12939 at commit [`f2b7adb`](https://github.com/apache/spark/commit/f2b7adb42adc7726d17842a98c1b5d4a1d14067c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218122540
  
    **[Test build #58228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58228/consoleFull)** for PR 12939 at commit [`15a23aa`](https://github.com/apache/spark/commit/15a23aab32eb0ba409d1555edec6b71a5ae59494).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218702610
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219931861
  
    We can just expand the whitelist and add * + and such into that can't we? My main worry is that security is very difficult to get right, and having a whitelist substantially reduces the chance of corner cases that we didn't expect happening.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219785529
  
    LGTM,
    Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-217313173
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219650199
  
    **[Test build #58675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58675/consoleFull)** for PR 12939 at commit [`f2b7adb`](https://github.com/apache/spark/commit/f2b7adb42adc7726d17842a98c1b5d4a1d14067c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r62572000
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,8 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    +    str.substring(0, len).replace("*/", "\\*\\/")
    +      .replaceAll("(^|[^\\\\])(\\\\(\\\\\\\\)*u)", "$1\\\\$2") + suffix
    --- End diff --
    
    How about we also have a comment at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218382799
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219938986
  
    `/` is used as the division operator and `*` is used as the multiplication operator so it's good to add those characters but we should `*/` so we need to add `\*(?!/)` and `(?<!*)/` to the whitelist instead of just `/` and `*`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219786749
  
    @sarutak Could you send another PR for 1.6 branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219933348
  
    @davies this is the 2nd security bug with codegen we found already.
    
    @sarutak sgtm.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218382804
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58333/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218122745
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219932692
  
    O.K. Initially we add some characters to the whitelist and if we need some more characters, we'll consider whether it should be add  or not at any time. How about this idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r62718307
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,8 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    +    str.substring(0, len).replace("*/", "\\*\\/")
    +      .replaceAll("(^|[^\\\\])(\\\\(\\\\\\\\)*u)", "$1\\\\$2") + suffix
    --- End diff --
    
    Thanks for the suggestion @mhseiden .
    I tried escaping by `escapeJava` and it may fix this issue but I noticed it escapes all of "\", means the number of "\" will be doubled.
    For example, "\\\u0022" will be "\\\\\\u0022" but I expects only "\" just before "u" will be escaped if the number of "\" is odd.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-217313174
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57927/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218368915
  
    **[Test build #58333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58333/consoleFull)** for PR 12939 at commit [`7106f23`](https://github.com/apache/spark/commit/7106f234fd4f04ce8e922e643bb343ac635d09d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r63464908
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,18 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    --- End diff --
    
    Thanks for the advice.
    I think "\u" should be escaped too otherwise, the compilation will fail when invalid unicode characters, like `\u002X` or `\u001`, are in literals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218702611
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58460/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-217313015
  
    **[Test build #57927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57927/consoleFull)** for PR 12939 at commit [`30ad081`](https://github.com/apache/spark/commit/30ad0819103d35b612c52000bcdcd0b5daa798e1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r62637582
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,8 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    +    str.substring(0, len).replace("*/", "\\*\\/")
    +      .replaceAll("(^|[^\\\\])(\\\\(\\\\\\\\)*u)", "$1\\\\$2") + suffix
    --- End diff --
    
    Yeah, make sense.
    I've added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219933074
  
    @rxin If there is any new bug found on this, we could switch to white list, otherwise I'd like to have the current solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219934463
  
    @sarutak That's true. Should `/` also be common used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-217298905
  
    **[Test build #57927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57927/consoleFull)** for PR 12939 at commit [`30ad081`](https://github.com/apache/spark/commit/30ad0819103d35b612c52000bcdcd0b5daa798e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r63404429
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,18 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    --- End diff --
    
    We only need to make sure that the comment string does not have `*/` in it, one simpler solution could be
    
    ```
    str.substring(0, len).replace("*/", "*\\/").replace("u0022/", "u0022\\/") + suffix
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r63562738
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,18 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    --- End diff --
    
    Good point, LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by mhseiden <gi...@git.apache.org>.
Github user mhseiden commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r62688712
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
    @@ -162,7 +162,8 @@ package object util {
       def toCommentSafeString(str: String): String = {
         val len = math.min(str.length, 128)
         val suffix = if (str.length > len) "..." else ""
    -    str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "\\\\u") + suffix
    +    str.substring(0, len).replace("*/", "\\*\\/")
    +      .replaceAll("(^|[^\\\\])(\\\\(\\\\\\\\)*u)", "$1\\\\$2") + suffix
    --- End diff --
    
    Is the implementation of `org.apache.commons.lang3.StringEscapeUtils.escapeJava(...)` sufficient to cover this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219944159
  
    I have one concern about the whitelisting approach. Even if each single character is safe, it's difficult to ensure any character sequences which consist of those safe characters are always safe. 
    E.g. `*` and `/` are safe themselves but `*/` is not safe. It's difficult to ensure there are no unsafe combination such.
    
    The place holder approach I mentioned above (#12979) may be safer because the place holder consists of only `{`, `comment_placeholder`, numbers and `}`.
    
    Anyway, I'll try the whitelisting approach. Let's discuss more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218382549
  
    **[Test build #58333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58333/consoleFull)** for PR 12939 at commit [`7106f23`](https://github.com/apache/spark/commit/7106f234fd4f04ce8e922e643bb343ac635d09d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219935334
  
    Either way works for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-217612432
  
    CC: @rxin , @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12939#discussion_r63406727
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2496,4 +2496,28 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
         }
       }
     
    +  test("check code injection is prevented") {
    +    // `\u002A/` is `*/`
    --- End diff --
    
    Could you add more cases:
    ```
    */
    *\u002F
    \u002A/
    \\u002A/
    \u002A\u002F
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218702380
  
    **[Test build #58460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58460/consoleFull)** for PR 12939 at commit [`1140642`](https://github.com/apache/spark/commit/1140642e10273954ee2f7787b92a7c80f5e6ec89).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-219931480
  
    Lots of punctuation characters like `*`, `+` can be used as an operator in expressions so I'm afraid comments in generated code will be difficult to read if characters are removed based on the whitelist.
    
    On the other hand, I noticed my another PR (#12979) can keep the readability of the comment  and the safety.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218683370
  
    **[Test build #58460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58460/consoleFull)** for PR 12939 at commit [`1140642`](https://github.com/apache/spark/commit/1140642e10273954ee2f7787b92a7c80f5e6ec89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12939#issuecomment-218104347
  
    **[Test build #58228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58228/consoleFull)** for PR 12939 at commit [`15a23aa`](https://github.com/apache/spark/commit/15a23aab32eb0ba409d1555edec6b71a5ae59494).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org