You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/08/19 15:03:38 UTC

[GitHub] spark pull request #22048: [SPARK-25108][SQL] Fix the show method to display...

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22048#discussion_r211105897
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -294,23 +294,24 @@ class Dataset[T] private[sql](
         // We set a minimum column width at '3'
         val minimumColWidth = 3
     
    +    val regex = """[^\x00-\u2e39]""".r
    --- End diff --
    
    This could use a comment and slightly better name for the variable? I also wonder if a regex is a little bit slow for scanning every character.
    
    However it's not clear this definition is accurate enough. According to things like https://stackoverflow.com/questions/13505075/analyzing-full-width-or-half-width-character-in-java we're really looking for the concept of "fullwidth" East Asian characters. The answer there provides a somewhat more precise definition, though others say you need something like icu4j for a proper definition. 
    
    Maybe at least adopt the alternative proposed in the SO answer? It's not necessary to be perfect here, as it's a cosmetic issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org