You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by vinodkc <gi...@git.apache.org> on 2018/08/21 15:19:24 UTC

[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...

GitHub user vinodkc opened a pull request:

    https://github.com/apache/spark/pull/22171

    [SPARK-25177][SQL] When dataframe decimal type column having scale higher than 6, 0 values are shown in scientific notation

    ## What changes were proposed in this pull request?
    If scale of decimal type is > 6 , 0 value will be shown in scientific notation and hence, when the dataframe output is saved to external database, it fails due to scientific notation on "0" values.
    In java.math.BigDecimal,  if the scale is >6 , 0 will be show in scientific notation.
    
    In Postgrasql, 0 decimal value will be shown with non-scientific notation (plain string), this PR make spark SQL result consistent with Postgrsql.
    ## How was this patch tested?
    Added 2 unit tests 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vinodkc/spark br_fix_precision_zero

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22171
    
----
commit 1ebeae518f44439af7ceff2ce5fb80caf44f1d45
Author: Vinod KC <vi...@...>
Date:   2018-08-21T15:10:47Z

    Fix precision issue with zero when decimal type scale > 6

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    **[Test build #95151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95151/testReport)** for PR 22171 at commit [`b5644d7`](https://github.com/apache/spark/commit/b5644d70951e29e9175c4ab9aede41b3143cad7f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    I don't have a strong opinion about the display, scientific notation is fine to me.
    
    I'm curious about "... but also in dataset write operations. External databases like netezza fails to save the result ..."
    
    How can this happen? When Spark writes decimal out, the external systems will get decimal values, not string values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Is there a standard about how should CSV store decimal values?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Hm, actually I thought this makes sense tho.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    **[Test build #95156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95156/testReport)** for PR 22171 at commit [`d1ef674`](https://github.com/apache/spark/commit/d1ef67467f77ae85fac880185a29cc2ba74d31fd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2826/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Scientific notation is more efficient on saving the values in CSV. If there are many zero values of high scale decimal type, this non scientific notation can cost storage space and loading time.
    
    I'm not sure if there is a standard for this. But I did a search roughly, looks like it is common to save decimal values as scientific notation in CSV.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    @vinodkc, I think you can mark up the code bit via 
    
    ``````
    ```
    spark.sql("create table test (a decimal(10,7), b decimal(10,6), c decimal(10,8))")
    ...
    +---------+--------+----------+
    ```
    ``````


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2819/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95028/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22171
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org