You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuejianbest <gi...@git.apache.org> on 2018/08/09 03:31:59 UTC

[GitHub] spark pull request #22048: Fix the show method to display the wide character...

GitHub user xuejianbest opened a pull request:

    https://github.com/apache/spark/pull/22048

    Fix the show method to display the wide character alignment problem

    before:
    +---+---------------------------+-------------+
    |id |中国                         |s2           |
    +---+---------------------------+-------------+
    |1  |ab                         |[a]          |
    |2  |null                       |[中国, abc]    |
    |3  |ab1                        |[hello world]|
    |4  |か行 きゃ(kya) きゅ(kyu) きょ(kyo) |[“中国]        |
    |5  |中国(你好)a                    |[“中(国), 312] |
    |6  |中国山(东)服务区                  |[“中(国)]      |
    |7  |中国山东服务区                    |[中(国)]       |
    |8  |                           |[中国]         |
    +---+---------------------------+-------------+
    
    after:
    +---+--------------------------+----------------+
    | id|                      中国|              s2|
    +---+--------------------------+----------------+
    |  1|                        ab|             [a]|
    |  2|                      null|     [中国, abc]|
    |  3|                       ab1|   [hello world]|
    |  4|か行 きゃ(kya) きゅ(kyu...|         [“中国]|
    |  5|             中国(你好)a|[“中(国), 312]|
    |  6|          中国山(东)服务区|      [“中(国)]|
    |  7|            中国山东服务区|        [中(国)]|
    |  8|                          |          [中国]|
    +---+--------------------------+----------------+
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuejianbest/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22048
    
----
commit 1b9b2e73b8637b8da15b790a06b08e8810fc75d9
Author: xuejianbest <38...@...>
Date:   2018-08-09T03:11:04Z

    Fix the show method to display the wide character alignment problem
    
    before:
    +---+---------------------------+-------------+
    |id |中国                         |s2           |
    +---+---------------------------+-------------+
    |1  |ab                         |[a]          |
    |2  |null                       |[中国, abc]    |
    |3  |ab1                        |[hello world]|
    |4  |か行 きゃ(kya) きゅ(kyu) きょ(kyo) |[“中国]        |
    |5  |中国(你好)a                    |[“中(国), 312] |
    |6  |中国山(东)服务区                  |[“中(国)]      |
    |7  |中国山东服务区                    |[中(国)]       |
    |8  |                           |[中国]         |
    +---+---------------------------+-------------+
    
    after:
    +---+--------------------------+----------------+
    | id|                      中国|              s2|
    +---+--------------------------+----------------+
    |  1|                        ab|             [a]|
    |  2|                      null|     [中国, abc]|
    |  3|                       ab1|   [hello world]|
    |  4|か行 きゃ(kya) きゅ(kyu...|         [“中国]|
    |  5|             中国(你好)a|[“中(国), 312]|
    |  6|          中国山(东)服务区|      [“中(国)]|
    |  7|            中国山东服务区|        [中(国)]|
    |  8|                          |          [中国]|
    +---+--------------------------+----------------+

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    LGTM, then let me ask @gatorsmile


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    OK, it isn't just me. We'll see if it's a broader problem.
    
    I manually cherry-picked to branch-2.4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    @xuejianbest Could you please create a JIRA entry and add test cases to this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    **[Test build #4305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4305/testReport)** for PR 22048 at commit [`3649de5`](https://github.com/apache/spark/commit/3649de50235cd19cfea2c3d88d1ccfb18ea8893a).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213537424
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,29 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    // Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    +    // The number of half width of a string
    +    def stringHalfWidth = (str: String) => {
    +      str.length + fullWidthRegex.findAllIn(str).size
    +    }
    --- End diff --
    
    better to move this method into `util.Utils` or something as a helper function?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Please help me @srowen 
    I don't know where I need to make changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    I looked at all the 0x0000-0xFFFF characters (unicode) and showed them under Xshell, then found all the full width characters. Get the latest regular expression.
    A new version has been submitted, and the variable name has been changed and comments added, please see below.
    @srowen @kiszk 
    
    A new version is submitted, and the variable name is changed and an annotation is added. See below.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Also, can you add tests in `DatasetSuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213608404
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,29 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    // Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    +    // The number of half width of a string
    +    def stringHalfWidth = (str: String) => {
    +      str.length + fullWidthRegex.findAllIn(str).size
    +    }
    --- End diff --
    
    Moved it into `util.Utils`. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r214626069
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private val fullWidthRegex = ("""[""" +
    +    // scalastyle:off nonascii
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    --- End diff --
    
    > I looked at all the 0x0000-0xFFFF characters (unicode) and showed them under Xshell, then found all the full width characters. Get the regular expression.
    
    Can you describe them there and put a references to a public unicode document?
    See the comment in `UTF8String`;
    https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L65
    
    > It takes 49 milliseconds to complete matching all 1000 strings.
    
    How about some additional overheads when calling `showString` as compared to `showString ` w/o this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213384981
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    --- End diff --
    
    I wonder if detecting this using a regex for each character is slow? probably not meaningfully enough to care about, but I wonder.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213699101
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,27 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private lazy val fullWidthRegex = ("""[""" +
    --- End diff --
    
    Doesn't need to be lazy


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Ping @xuejianbest 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213888028
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1184,6 +1184,25 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         assert(Utils.getSimpleName(classOf[MalformedClassObject.MalformedClass]) ===
           "UtilsSuite$MalformedClassObject$MalformedClass")
       }
    +
    +  test("stringHalfWidth") {
    +    assert(Utils.stringHalfWidth(null) == 0)
    +    assert(Utils.stringHalfWidth("") == 0)
    +    assert(Utils.stringHalfWidth("ab c") == 4)
    +    assert(Utils.stringHalfWidth("1098") == 4)
    +    assert(Utils.stringHalfWidth("mø") == 2)
    +    assert(Utils.stringHalfWidth("γύρ") == 3)
    --- End diff --
    
    I found similar code in the `org.apache.commons.lang3.StringUtils` class.
    So the same test was added.
    In StringUtils 6934 line is comment:
    `* StringUtils.isNumeric("\u0967\u0968\u0969")  = true`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Thank you for creating a JIRA entry and for putting the result. The test case is not available yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    @dongjoon-hyun could I ask a favor; can you try merging this? the merge script should say it's merged and offer to back-port to 2.4. However I keep getting errors about it not being mergeable, although all upstream branches and my clone all are up to date.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    I created [a PR](https://github.com/apache/spark/pull/16086) to solve the same problem.
    Can this PR handle [East Asian Width](http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt) correctly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    I see. A new commit has been done.
    Thinks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Have a look at the test results, which show the style problem it flags:
    https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4305/console
    
    ```
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2803:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2804:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2805:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2806:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2807:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2808:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2809:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2810:4: nonascii.message
    ```
    
    So it doesn't like non-ASCII chars after all! I wonder how they exist in the other parts of the source. I checked and other parts just disable the check for sections of code where it's appropriate to use unicode chars.
    
    Just wrap these parts of the code in:
    
    ```
    // scalastyle:off nonascii
    ...
    // scalastyle:on nonascii
    ```
    
    Sorry, didn't know this was needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213869119
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,27 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private lazy val fullWidthRegex = ("""[""" +
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    +    """]""").r
    +  /**
    +   * Return the number of half width of a string
    +   * A full width character occupies two half widths
    +   */
    --- End diff --
    
    How about this?
    ```
      /**
       * Return the number of half widths in a given string. Note that a full width character
       * occupies two half widths.
       */
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213537463
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,29 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    // Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    +    // The number of half width of a string
    +    def stringHalfWidth = (str: String) => {
    +      str.length + fullWidthRegex.findAllIn(str).size
    +    }
    --- End diff --
    
    better to add tests for the helper function, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Just add a new commit with a little info about what this does and how you generated it. The commits are squashed anyway. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213699025
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,27 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private lazy val fullWidthRegex = ("""[""" +
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    +    """]""").r
    +  /**
    +   * Return the number of half width of a string
    +   * A full width character occupies two half widths
    +   */
    +  def stringHalfWidth(str: String): Int = {
    +    if(str == null) 0 else str.length + fullWidthRegex.findAllIn(str).size
    --- End diff --
    
    Nit: space before if


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213525417
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    --- End diff --
    
    OK, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r214778257
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private val fullWidthRegex = ("""[""" +
    +    // scalastyle:off nonascii
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    --- End diff --
    
    > Can you describe them there and put a references to a public unicode document?
    
    This is a regular expression match using unicode, regardless of the specific encoding.
    For example, the following string is encoded using gbk instead of utf8, and the match still works:
    `
        val bytes = Array[Byte](0xd6.toByte, 0xd0.toByte, 0xB9.toByte, 0xFA.toByte)
        val s1 = new String(bytes, "gbk")
        
        println(s1) //中国
        
        val fullWidthRegex = ("""[""" +
        // scalastyle:off nonascii
        """\u1100-\u115F""" +
        """\u2E80-\uA4CF""" +
        """\uAC00-\uD7A3""" +
        """\uF900-\uFAFF""" +
        """\uFE10-\uFE19""" +
        """\uFE30-\uFE6F""" +
        """\uFF00-\uFF60""" +
        """\uFFE0-\uFFE6""" +
        // scalastyle:on nonascii
        """]""").r
        
        println(fullWidthRegex.findAllIn(s1).size) //2
    `
    This regular expression is obtained experimentally under a specific font.
    I don't understand what you are going to do.
    
    
    > How about some additional overheads when calling showString as compared to showString w/o this patch?
    
    I tested a Dataset consisting of 100 rows, each row has two columns, one column is the index (0-99), and the other column is a random string of length 100 characters, and then the showString display is called separately.
    The original showString method (w/o this patch) took about 42ms, and the improved time took about 46ms, and the performance was about 10% worse.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r211105897
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,24 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    val regex = """[^\x00-\u2e39]""".r
    --- End diff --
    
    This could use a comment and slightly better name for the variable? I also wonder if a regex is a little bit slow for scanning every character.
    
    However it's not clear this definition is accurate enough. According to things like https://stackoverflow.com/questions/13505075/analyzing-full-width-or-half-width-character-in-java we're really looking for the concept of "fullwidth" East Asian characters. The answer there provides a somewhat more precise definition, though others say you need something like icu4j for a proper definition. 
    
    Maybe at least adopt the alternative proposed in the SO answer? It's not necessary to be perfect here, as it's a cosmetic issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    @xuejianbest thank you for your update. Would it be possible to commit test cases, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r214614936
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private val fullWidthRegex = ("""[""" +
    +    // scalastyle:off nonascii
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    --- End diff --
    
    - How to get this Regex list? Any reference? It sounds like this should be a general problem
    
    I looked at all the 0x0000-0xFFFF characters (unicode) and showed them under Xshell, then found all the full width characters. Get the regular expression.
    
    
    - What is the performance impact?
    
    I generated 1000 strings, each consisting of 1000 characters with a random unicode of 0x0000-0xFFFF. (a total of 1 million characters.)
    Then use this regular expression to find the full width character of these strings.
    I tested 100 rounds and then averaged.
    It takes 49 milliseconds to complete matching all 1000 strings.
    
    @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213525240
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    --- End diff --
    
    I generated 1000 strings, each consisting of 1000 characters with a random unicode of 0x0000-0xFFFF. (a total of 1 million characters.)
    Then use this regular expression to find the full width character of these strings. 
    I tested 100 rounds and then averaged.
    It takes 49 milliseconds to complete matching all 1000 strings.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    **[Test build #4305 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4305/testReport)** for PR 22048 at commit [`3649de5`](https://github.com/apache/spark/commit/3649de50235cd19cfea2c3d88d1ccfb18ea8893a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    @srowen . Sure. I'll try to backport this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r212557191
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,24 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    val regex = """[^\x00-\u2e39]""".r
    --- End diff --
    
    I agree with @srowen's comment. To make it clear, could you please add tests for more characters?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213699899
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1184,6 +1184,25 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         assert(Utils.getSimpleName(classOf[MalformedClassObject.MalformedClass]) ===
           "UtilsSuite$MalformedClassObject$MalformedClass")
       }
    +
    +  test("stringHalfWidth") {
    +    assert(Utils.stringHalfWidth(null) == 0)
    +    assert(Utils.stringHalfWidth("") == 0)
    +    assert(Utils.stringHalfWidth("ab c") == 4)
    +    assert(Utils.stringHalfWidth("1098") == 4)
    +    assert(Utils.stringHalfWidth("mø") == 2)
    +    assert(Utils.stringHalfWidth("γύρ") == 3)
    --- End diff --
    
    I think it is OK to type Unicode chars into source files because git and Scala ought to be set up to encode and read the source as UTF8. I have avoided it in the past and used Unicode escapes (but put actual string in comment) to be safe. I don't feel strongly. I think we have other instances of Unicode in the source. Just a comment. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    I see, Thinks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    I have moved the method of calculating the half width of a string into util.Utils object.
    @maropu @srowen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    **[Test build #4329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4329/testReport)** for PR 22048 at commit [`45ac272`](https://github.com/apache/spark/commit/45ac272ca667f3330f7b550b463b23c284d9eadf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    After testing, it is found that regular expressions are changed to the following.
    `val regex = """[^\x00-\u2e39]""".r`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213537923
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
         if (!vertical) {
           // Initialise the width of each column to a minimum value
           val colWidths = Array.fill(numCols)(minimumColWidth)
     
           // Compute the width of each column
           for (row <- rows) {
             for ((cell, i) <- row.zipWithIndex) {
    -          colWidths(i) = math.max(colWidths(i), cell.length)
    +          colWidths(i) = math.max(colWidths(i), cell.length + fullWidthRegex.findAllIn(cell).size)
    --- End diff --
    
    I committed a new version. See if this is appropriate please ?
    @srowen


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r214867601
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private val fullWidthRegex = ("""[""" +
    +    // scalastyle:off nonascii
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    --- End diff --
    
    Do I need to merge the above commited into one commit, 
    Or add another new commit?
    Or change the last commit comments ?
    @srowen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213385143
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
         if (!vertical) {
           // Initialise the width of each column to a minimum value
           val colWidths = Array.fill(numCols)(minimumColWidth)
     
           // Compute the width of each column
           for (row <- rows) {
             for ((cell, i) <- row.zipWithIndex) {
    -          colWidths(i) = math.max(colWidths(i), cell.length)
    +          colWidths(i) = math.max(colWidths(i), cell.length + fullWidthRegex.findAllIn(cell).size)
    --- End diff --
    
    What about a utility method for the `x.length + fullWidthRegex.findAllIn(x).size` expression that recurs here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22048


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    +1 for adding `scalastyle:off nonascii` in that part.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Merging to master and 2.4, though for some reason I can't back-port with the merge script right now. Will try later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213880735
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1184,6 +1184,25 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         assert(Utils.getSimpleName(classOf[MalformedClassObject.MalformedClass]) ===
           "UtilsSuite$MalformedClassObject$MalformedClass")
       }
    +
    +  test("stringHalfWidth") {
    +    assert(Utils.stringHalfWidth(null) == 0)
    +    assert(Utils.stringHalfWidth("") == 0)
    +    assert(Utils.stringHalfWidth("ab c") == 4)
    +    assert(Utils.stringHalfWidth("1098") == 4)
    +    assert(Utils.stringHalfWidth("mø") == 2)
    +    assert(Utils.stringHalfWidth("γύρ") == 3)
    --- End diff --
    
    Don't change it. There are Unicode characters elsewhere in the source, and it's clearer this way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213883324
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,27 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private lazy val fullWidthRegex = ("""[""" +
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    +    """]""").r
    +  /**
    +   * Return the number of half width of a string
    +   * A full width character occupies two half widths
    +   */
    --- End diff --
    
    OK, thinks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Since this change does not look minor, could you please create a JIRA entry?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Ur, it seems that I'm facing the same situation. In this case, should we just cherry-pick to `branch-2.4`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Below is the display effect I tested on Xshell5.
    [https://issues.apache.org/jira/secure/attachment/12935462/show.bmp](url)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213537514
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,29 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    // Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    --- End diff --
    
    This line goes over the limit, 100.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213868352
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,27 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private lazy val fullWidthRegex = ("""[""" +
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    +    """]""").r
    --- End diff --
    
    super nit: add blank line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213525456
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    +    val fullWidthRegex = """[\u1100-\u115F\u2E80-\uA4CF\uAC00-\uD7A3\uF900-\uFAFF\uFE10-\uFE19\uFE30-\uFE6F\uFF00-\uFF60\uFFE0-\uFFE6]""".r
    --- End diff --
    
    OK, not worth optimizing now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    style errors, we cannot include non-ascii characters in files;
    ```
    Scalastyle checks failed at following occurrences:
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2803:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2804:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2805:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2806:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2807:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2808:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2809:4: nonascii.message
    [error] /home/jenkins/workspace/NewSparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/util/Utils.scala:2810:4: nonascii.message
    [error] Total time: 16 s, completed Aug 30, 2018 6:48:19 AM
    [error] running /home/jenkins/workspace/NewSparkPullRequestBuilder@2/dev/lint-scala ; received return code 1
    Attempting to post to Github...
     > Post successful.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213884655
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -301,16 +301,16 @@ class Dataset[T] private[sql](
           // Compute the width of each column
           for (row <- rows) {
             for ((cell, i) <- row.zipWithIndex) {
    -          colWidths(i) = math.max(colWidths(i), cell.length)
    +          colWidths(i) = math.max(colWidths(i), Utils.stringHalfWidth(cell))
             }
           }
     
           val paddedRows = rows.map { row =>
             row.zipWithIndex.map { case (cell, i) =>
               if (truncate > 0) {
    -            StringUtils.leftPad(cell, colWidths(i))
    --- End diff --
    
    OK that's fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213882803
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -301,16 +301,16 @@ class Dataset[T] private[sql](
           // Compute the width of each column
           for (row <- rows) {
             for ((cell, i) <- row.zipWithIndex) {
    -          colWidths(i) = math.max(colWidths(i), cell.length)
    +          colWidths(i) = math.max(colWidths(i), Utils.stringHalfWidth(cell))
             }
           }
     
           val paddedRows = rows.map { row =>
             row.zipWithIndex.map { case (cell, i) =>
               if (truncate > 0) {
    -            StringUtils.leftPad(cell, colWidths(i))
    --- End diff --
    
    This method fills the string to the specified length.
    But what you need to do here is to increase the string by the specified length by padding.
    If you want to use this function, code programming:
    `StringUtils.leftPad(cell, colWidths(i) - Utils.stringHalfWidth(cell) + cell.length)`
    I think it is more difficult to understand.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    ![df.show](https://issues.apache.org/jira/secure/attachment/12935462/show.bmp)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213700228
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -301,16 +301,16 @@ class Dataset[T] private[sql](
           // Compute the width of each column
           for (row <- rows) {
             for ((cell, i) <- row.zipWithIndex) {
    -          colWidths(i) = math.max(colWidths(i), cell.length)
    +          colWidths(i) = math.max(colWidths(i), Utils.stringHalfWidth(cell))
             }
           }
     
           val paddedRows = rows.map { row =>
             row.zipWithIndex.map { case (cell, i) =>
               if (truncate > 0) {
    -            StringUtils.leftPad(cell, colWidths(i))
    --- End diff --
    
    Oh why not use this method?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r214862749
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private val fullWidthRegex = ("""[""" +
    +    // scalastyle:off nonascii
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    --- End diff --
    
    I think this is fine. Just copy a summary of your comments here into the comments in the code. Yes this has nothing to do with UTF8 encoding directly. You are matching UCS2 really, 16bit char values. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Why did it fail?
    What else should I do?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213384744
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,25 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    //Regular expression matching full width characters
    --- End diff --
    
    Very small nit that might fail scalastyle -- space after comment slashes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r214490659
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2794,6 +2794,30 @@ private[spark] object Utils extends Logging {
           }
         }
       }
    +
    +  /**
    +   * Regular expression matching full width characters
    +   */
    +  private val fullWidthRegex = ("""[""" +
    +    // scalastyle:off nonascii
    +    """\u1100-\u115F""" +
    +    """\u2E80-\uA4CF""" +
    +    """\uAC00-\uD7A3""" +
    +    """\uF900-\uFAFF""" +
    +    """\uFE10-\uFE19""" +
    +    """\uFE30-\uFE6F""" +
    +    """\uFF00-\uFF60""" +
    +    """\uFFE0-\uFFE6""" +
    --- End diff --
    
    A general question. 
    - How to get this Regex list? Any reference?  It sounds like this should be a general problem
    - What is the performance impact?
    
    Can you answer them and post them in the PR description?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    **[Test build #4329 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4329/testReport)** for PR 22048 at commit [`45ac272`](https://github.com/apache/spark/commit/45ac272ca667f3330f7b550b463b23c284d9eadf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22048
  
    Hm! so the warning triggers on unicode escape sequences? That seems like a bug in the style checker. Oh well, yes keep them. I'll retest ...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Posted by xuejianbest <gi...@git.apache.org>.
Github user xuejianbest commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r213879659
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1184,6 +1184,25 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         assert(Utils.getSimpleName(classOf[MalformedClassObject.MalformedClass]) ===
           "UtilsSuite$MalformedClassObject$MalformedClass")
       }
    +
    +  test("stringHalfWidth") {
    +    assert(Utils.stringHalfWidth(null) == 0)
    +    assert(Utils.stringHalfWidth("") == 0)
    +    assert(Utils.stringHalfWidth("ab c") == 4)
    +    assert(Utils.stringHalfWidth("1098") == 4)
    +    assert(Utils.stringHalfWidth("mø") == 2)
    +    assert(Utils.stringHalfWidth("γύρ") == 3)
    --- End diff --
    
    Do you mean that the string here needs to be converted to a unicode escape form?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org