You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2014/08/01 19:21:39 UTC
[jira] [Created] (SPARK-2789) Apply names to RDD to becoming
SchemaRDD
Davies Liu created SPARK-2789:
---------------------------------
Summary: Apply names to RDD to becoming SchemaRDD
Key: SPARK-2789
URL: https://issues.apache.org/jira/browse/SPARK-2789
Project: Spark
Issue Type: New Feature
Reporter: Davies Liu
In order to simplify apply schema, we could add an API called applyNames(), which will infer the types in the RDD and create an schema with names, then apply this schema on it to becoming a SchemaRDD. The names could be provides by String with names separated by space.
For example:
rdd = sc.parallelize([("Alice", 10)])
srdd = sqlCtx.applyNames(rdd, "name age")
User don't need to create an case class or StructType to have all power of Spark SQL.
The string presentation of schema also could support nested structure (MapType, ArrayType and StructType), for example:
"name age address(city zip) likes[title stars] props{[value type]}"
It will equal to unnamed schema:
root
|--name
|--age
|--address
|--|--city
|--|--zip
|--likes
|--|--element
|--|--|--title
|--|--|--starts
|--props
|--|--key:
|--|--value:
|--|--|--element
|--|--|--|--value
|--|--|--|--type
All the names of fields are seperated by space, the struct of field (if it is nested type) follows the name without space, wich shoud startswith "(" (StructType) or "[" (ArrayType) or "{" (MapType).
--
This message was sent by Atlassian JIRA
(v6.2#6252)