You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/01/29 06:52:53 UTC

How and when the types of the result set are figured out in Spark?

Hi All,

I would like to know how and when the types of the result set are figured
out in Spark? for example say I have the following dataframe.

*inputdf*

col1  | col2 | col3
-------------------
  1   |   2  | 5
  2   |   3  | 6

Now say I do something like below (Pseudo sql)

resultdf = select col2/2 from inputdf

result.writeStream().format("es").start()

the first document in ES will be {"col2": 1} and the second document will
be {"col2": 1.5} so I would think ES would throw type mismatch error here
if dynamic mapping is disabled on ES server.

My question really know is from spark perspective when will the types of
resultdf will be figured out ? is it before writing to ES(in general any
sink) or after writing the first document?

Thanks!