You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/28 05:46:24 UTC

[GitHub] [spark] dongjoon-hyun edited a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

dongjoon-hyun edited a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496373474
 
 
   First of all, the followings are the most frequent use cases.
   1. HEADER and INFERSCHEMA
   ```scala
   scala> spark.read.option("header", true).option("inferSchema", true).csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
   ```
   
   2. USER-DEFINED SCHEMA or Hive MetaStore
   ```scala
   scala> case class Person(name: String, age: Long)
   scala> spark.read.schema("name string, age long").csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
   ```
   
   I believe the above two are more natural.
   
   Anyway, cc @HyukjinKwon and @MaxGekk 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org