You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chris Fregly (JIRA)" <ji...@apache.org> on 2015/05/01 00:00:07 UTC

[jira] [Commented] (SPARK-7178) Improve DataFrame documentation and code samples

    [ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522351#comment-14522351 ] 

Chris Fregly commented on SPARK-7178:
-------------------------------------

fillNa() is also commonly used:

https://forums.databricks.com/questions/790/how-do-i-replace-nulls-with-0s-in-a-dataframe.html

> Improve DataFrame documentation and code samples
> ------------------------------------------------
>
>                 Key: SPARK-7178
>                 URL: https://issues.apache.org/jira/browse/SPARK-7178
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.1
>            Reporter: Chris Fregly
>              Labels: dataframe
>
> AND and OR are not straightforward when using the new DataFrame API.
> the current convention - accepted by Pandas users - is to use the bitwise & and | instead of AND and OR.  when using these, however, you need to wrap each expression in parenthesis to prevent the bitwise operator from dominating.
> also, working with StructTypes is a bit confusing.  the following link:  https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema (Python tab) implies that you can work with tuples directly when creating a DataFrame.
> however, the following code errors out unless we explicitly use Row's:
> {code}
> from pyspark.sql import Row
> from pyspark.sql.types import *
> # The schema is encoded in a string.
> schemaString = "a"
> fields = [StructField(field_name, MapType(StringType(),IntegerType())) for field_name in schemaString.split()]
> schema = StructType(fields)
> df = sqlContext.createDataFrame([Row(a={'b': 1})], schema)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org