You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hafiz Mujadid <ha...@gmail.com> on 2015/07/01 20:03:12 UTC

making dataframe for different types using spark-csv

Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: making dataframe for different types using spark-csv

Posted by Hafiz Mujadid <ha...@gmail.com>.

Thanks

On Thu, Jul 2, 2015 at 5:40 PM, Kohler, Curt E (ELS-STL) <
C.kohler@elsevier.com> wrote:

>  You should be able to do something like this (assuming an input file
> formatted as:  String, IntVal, LongVal)
>
>
>  import org.apache.spark.sql.types._
>
>  val recSchema = StructType(List(StructField(“strVal", StringType, false),
>                                         StructField(“intVal", IntegerType,
> false),
> StructField(“longVal", LongType, false)))
>
>  val filePath = “some path to your dataset"
>
>  val df1 =  sqlContext.load("com.databricks.spark.csv", recSchema,
> Map("path" -> filePath , "header" -> "false", "delimiter" -> ",", "mode" ->
> "FAILFAST"))
>
>   From: Hafiz Mujadid <ha...@gmail.com>
> Date: Wednesday, July 1, 2015 at 10:59 PM
> To: Mohammed Guller <mo...@glassbeam.com>
> Cc: Krishna Sankar <ks...@gmail.com>, "user@spark.apache.org" <
> user@spark.apache.org>
>
> Subject: Re: making dataframe for different types using spark-csv
>
>   hi Mohammed Guller!
>
>  How can I specify schema in load method?
>
>
>
> On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller <mo...@glassbeam.com>
> wrote:
>
>>  Another option is to provide the schema to the load method. One variant
>> of the sqlContext.load takes a schema as a input parameter. You can define
>> the schema programmatically as shown here:
>>
>>
>>
>>
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Krishna Sankar [mailto:ksankar42@gmail.com]
>> *Sent:* Wednesday, July 1, 2015 3:09 PM
>> *To:* Hafiz Mujadid
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: making dataframe for different types using spark-csv
>>
>>
>>
>> ·  use .cast("...").alias('...') after the DataFrame is read.
>>
>> ·  sql.functions.udf for any domain-specific conversions.
>>
>> Cheers
>>
>> <k/>
>>
>>
>>
>> On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid <ha...@gmail.com>
>> wrote:
>>
>> Hi experts!
>>
>>
>> I am using spark-csv to lead csv data into dataframe. By default it makes
>> type of each column as string. Is there some way to get dataframe of
>> actual
>> types like int,double etc.?
>>
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>>
>
>
>
>  --
> Regards: HAFIZ MUJADID
>



-- 
Regards: HAFIZ MUJADID

Re: making dataframe for different types using spark-csv

Posted by "Kohler, Curt E (ELS-STL)" <C....@elsevier.com>.

You should be able to do something like this (assuming an input file formatted as:  String, IntVal, LongVal)


import org.apache.spark.sql.types._

val recSchema = StructType(List(StructField("strVal", StringType, false),
                                        StructField("intVal", IntegerType, false),
StructField("longVal", LongType, false)))

val filePath = "some path to your dataset"

val df1 =  sqlContext.load("com.databricks.spark.csv", recSchema, Map("path" -> filePath , "header" -> "false", "delimiter" -> ",", "mode" -> "FAILFAST"))

From: Hafiz Mujadid <ha...@gmail.com>>
Date: Wednesday, July 1, 2015 at 10:59 PM
To: Mohammed Guller <mo...@glassbeam.com>>
Cc: Krishna Sankar <ks...@gmail.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: making dataframe for different types using spark-csv

hi Mohammed Guller!

How can I specify schema in load method?



On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Another option is to provide the schema to the load method. One variant of the sqlContext.load takes a schema as a input parameter. You can define the schema programmatically as shown here:

https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

Mohammed

From: Krishna Sankar [mailto:ksankar42@gmail.com<ma...@gmail.com>]
Sent: Wednesday, July 1, 2015 3:09 PM
To: Hafiz Mujadid
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: making dataframe for different types using spark-csv

·  use .cast("...").alias('...') after the DataFrame is read.
·  sql.functions.udf for any domain-specific conversions.
Cheers
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]<k/>

On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid <ha...@gmail.com>> wrote:
Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>




--
Regards: HAFIZ MUJADID

Re: making dataframe for different types using spark-csv

Posted by Hafiz Mujadid <ha...@gmail.com>.

hi Mohammed Guller!

How can I specify schema in load method?



On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller <mo...@glassbeam.com>
wrote:

>  Another option is to provide the schema to the load method. One variant
> of the sqlContext.load takes a schema as a input parameter. You can define
> the schema programmatically as shown here:
>
>
>
>
> https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>
>
>
> Mohammed
>
>
>
> *From:* Krishna Sankar [mailto:ksankar42@gmail.com]
> *Sent:* Wednesday, July 1, 2015 3:09 PM
> *To:* Hafiz Mujadid
> *Cc:* user@spark.apache.org
> *Subject:* Re: making dataframe for different types using spark-csv
>
>
>
> ·  use .cast("...").alias('...') after the DataFrame is read.
>
> ·  sql.functions.udf for any domain-specific conversions.
>
> Cheers
>
> <k/>
>
>
>
> On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid <ha...@gmail.com>
> wrote:
>
> Hi experts!
>
>
> I am using spark-csv to lead csv data into dataframe. By default it makes
> type of each column as string. Is there some way to get dataframe of actual
> types like int,double etc.?
>
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>



-- 
Regards: HAFIZ MUJADID

RE: making dataframe for different types using spark-csv

Posted by Mohammed Guller <mo...@glassbeam.com>.

Another option is to provide the schema to the load method. One variant of the sqlContext.load takes a schema as a input parameter. You can define the schema programmatically as shown here:

https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

Mohammed

From: Krishna Sankar [mailto:ksankar42@gmail.com]
Sent: Wednesday, July 1, 2015 3:09 PM
To: Hafiz Mujadid
Cc: user@spark.apache.org
Subject: Re: making dataframe for different types using spark-csv

·  use .cast("...").alias('...') after the DataFrame is read.
·  sql.functions.udf for any domain-specific conversions.
Cheers
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]<k/>

On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid <ha...@gmail.com>> wrote:
Hi experts!

I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?

Thanks

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: making dataframe for different types using spark-csv

Posted by Krishna Sankar <ks...@gmail.com>.

   - use .cast("...").alias('...') after the DataFrame is read.
   - sql.functions.udf for any domain-specific conversions.

Cheers
<k/>

On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid <ha...@gmail.com>
wrote:

> Hi experts!
>
>
> I am using spark-csv to lead csv data into dataframe. By default it makes
> type of each column as string. Is there some way to get dataframe of actual
> types like int,double etc.?
>
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>