You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chris Fregly (JIRA)" <ji...@apache.org> on 2015/05/01 00:00:07 UTC
[jira] [Commented] (SPARK-7178) Improve DataFrame documentation and
code samples
[ https://issues.apache.org/jira/browse/SPARK-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522351#comment-14522351 ]
Chris Fregly commented on SPARK-7178:
-------------------------------------
fillNa() is also commonly used:
https://forums.databricks.com/questions/790/how-do-i-replace-nulls-with-0s-in-a-dataframe.html
> Improve DataFrame documentation and code samples
> ------------------------------------------------
>
> Key: SPARK-7178
> URL: https://issues.apache.org/jira/browse/SPARK-7178
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.3.1
> Reporter: Chris Fregly
> Labels: dataframe
>
> AND and OR are not straightforward when using the new DataFrame API.
> the current convention - accepted by Pandas users - is to use the bitwise & and | instead of AND and OR. when using these, however, you need to wrap each expression in parenthesis to prevent the bitwise operator from dominating.
> also, working with StructTypes is a bit confusing. the following link: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema (Python tab) implies that you can work with tuples directly when creating a DataFrame.
> however, the following code errors out unless we explicitly use Row's:
> {code}
> from pyspark.sql import Row
> from pyspark.sql.types import *
> # The schema is encoded in a string.
> schemaString = "a"
> fields = [StructField(field_name, MapType(StringType(),IntegerType())) for field_name in schemaString.split()]
> schema = StructType(fields)
> df = sqlContext.createDataFrame([Row(a={'b': 1})], schema)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org