You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/01/26 00:22:21 UTC

how to create a DataType Object using the String representation in Java using Spark 2.2.0?

Hi All,

I have a datatype "IntegerType" represented as a String and now I want to
create DataType object out of that. I couldn't find in the DataType or
DataTypes api on how to do that?

Thanks!

Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

Posted by Rick Moritz <ra...@gmail.com>.
Hi,

We solved this the ugly way, when parsing external column definitions:

private def columnTypeToFieldType(columnType: String): DataType = {
  columnType match {
    case "IntegerType" => IntegerType
    case "StringType" => StringType
    case "DateType" => DateType
    case "FloatType" => FloatType
    case "DecimalType" => DecimalType.SYSTEM_DEFAULT
    case "TimeStampType" => TimestampType
    case "BooleanType" => BooleanType
    case _ => throw new IllegalArgumentException(s"ColumnType
$columnType is not known " +
      s"please add it in the ${this.getClass.getName} class!")
  }
}

There may be a prettier solution than this, but especially with
DecimalType, there are limitations where even with Reflection and
Class.forName, it's not trivial (i.e.
Class.forName(s"org.apache.spark.sql.types.$columnType"))
Furthermore, getting a companion object for a class name is a bit uglier
than getting just the class, see
https://stackoverflow.com/questions/11020746/get-companion-object-instance-with-new-scala-reflection-api
Since the number of types can be expected to be roughly constant, you only
have to overhead of Scala's matching engine, in the ugly solution. In our
case, the effort of engineering something was outshone by a simple method
that might rarely fail, but then does so in a mostly understandable way.

N.B.: mapping isn't complete -- complex types weren't in our scope.

On Fri, Jan 26, 2018 at 8:11 AM, Kurt Fehlhauer <kf...@gmail.com> wrote:

> Can you share your code and a sample of your data? WIthout seeing it, I
> can't give a definitive answer. I can offer some hints. If you have a
> column of strings you should either be able to create a new column casted
> to Integer. This can be accomplished two ways:
>
> df.withColumn("newColumn", df.currentColumn.cast(IntegerType))
>
> or
>
> val df = df.select("cast(CurretColumn as int) newColum")
>
>
> Without seeing your json, I really can't offer assistance.
>
>
> On Thu, Jan 25, 2018 at 11:39 PM, kant kodali <ka...@gmail.com> wrote:
>
>> It seems like its hard to construct a DataType given its String literal
>> representation.
>>
>> dataframe.types() return column names and its corresponding Types. for
>> example say I have an integer column named "sum" doing dataframe.dtypes()
>> would return "sum" and "IntegerType" but this string  representation
>> "IntegerType" doesnt seem to be very useful because I cannot do
>> DataType.fromJson("IntegerType") This will throw an error. so I am not
>> quite sure how to construct a DataType given its String representation ?
>>
>> On Thu, Jan 25, 2018 at 4:22 PM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I have a datatype "IntegerType" represented as a String and now I want
>>> to create DataType object out of that. I couldn't find in the DataType or
>>> DataTypes api on how to do that?
>>>
>>> Thanks!
>>>
>>
>>
>

Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

Posted by Kurt Fehlhauer <kf...@gmail.com>.
Can you share your code and a sample of your data? WIthout seeing it, I
can't give a definitive answer. I can offer some hints. If you have a
column of strings you should either be able to create a new column casted
to Integer. This can be accomplished two ways:

df.withColumn("newColumn", df.currentColumn.cast(IntegerType))

or

val df = df.select("cast(CurretColumn as int) newColum")


Without seeing your json, I really can't offer assistance.


On Thu, Jan 25, 2018 at 11:39 PM, kant kodali <ka...@gmail.com> wrote:

> It seems like its hard to construct a DataType given its String literal
> representation.
>
> dataframe.types() return column names and its corresponding Types. for
> example say I have an integer column named "sum" doing dataframe.dtypes()
> would return "sum" and "IntegerType" but this string  representation
> "IntegerType" doesnt seem to be very useful because I cannot do
> DataType.fromJson("IntegerType") This will throw an error. so I am not
> quite sure how to construct a DataType given its String representation ?
>
> On Thu, Jan 25, 2018 at 4:22 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> I have a datatype "IntegerType" represented as a String and now I want to
>> create DataType object out of that. I couldn't find in the DataType or
>> DataTypes api on how to do that?
>>
>> Thanks!
>>
>
>

Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

Posted by kant kodali <ka...@gmail.com>.
It seems like its hard to construct a DataType given its String literal
representation.

dataframe.types() return column names and its corresponding Types. for
example say I have an integer column named "sum" doing dataframe.dtypes()
would return "sum" and "IntegerType" but this string  representation
"IntegerType" doesnt seem to be very useful because I cannot do
DataType.fromJson("IntegerType") This will throw an error. so I am not
quite sure how to construct a DataType given its String representation ?

On Thu, Jan 25, 2018 at 4:22 PM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I have a datatype "IntegerType" represented as a String and now I want to
> create DataType object out of that. I couldn't find in the DataType or
> DataTypes api on how to do that?
>
> Thanks!
>