You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/07/30 08:16:00 UTC

[jira] [Commented] (SPARK-24946) PySpark - Allow np.Arrays and pd.Series in df.approxQuantile

    [ https://issues.apache.org/jira/browse/SPARK-24946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561614#comment-16561614 ] 

Hyukjin Kwon commented on SPARK-24946:
--------------------------------------

Workaround is pretty easy though. The np array can be wrapped by a list before calling the API.

Yea, probably, we should make the argument checking less strict but thing is, we should check if Py4J accepts all the iterables as an argument before.

As far as I know, yes, it accepts. However, it still need a throughout investigation because if we allow this one case, probably we should consider allowing all other cases too. 


> PySpark - Allow np.Arrays and pd.Series in df.approxQuantile
> ------------------------------------------------------------
>
>                 Key: SPARK-24946
>                 URL: https://issues.apache.org/jira/browse/SPARK-24946
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 2.3.1
>            Reporter: Paul Westenthanner
>            Priority: Minor
>              Labels: DataFrame, beginner, pyspark
>
> As Python user it is convenient to pass a numpy array or pandas series `{{approxQuantile}}(_col_, _probabilities_, _relativeError_)` for the probabilities parameter. 
>  
> Especially for creating cumulative plots (say in 1% steps) it is handy to use `approxQuantile(col, np.arange(0, 1.0, 0.01), relativeError)`.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org