You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "guenterh.lists" <gu...@bluewin.ch> on 2022/11/28 16:16:58 UTC
Implement custom datasource (writer) for Spark3
Dear list,
I'm trying to implement my own custom datasource writer for spark 3 to
serialize DataSets to an external storage (in my case RDF Triple ).
After reading various resources (books, articles, internet) I learned
that the implementation changed from Spark1 via datasources v2 in
Spark2 and was again changed in Spark3.
I was able to implement a DataFrame writer and could get a first result
for a DataSet with a simple case class.
I'm getting an exception with a more complex (nested) case class.
Caused by: org.apache.spark.sql.AnalysisException: unresolved
operator 'AppendData RelationV2[name#13, vorname#14, age#15] class
RDFTable, true;;
'AppendData RelationV2[name#13, vorname#14, age#15] class RDFTable, true
+- LocalRelation [name#3, vorname#4, age#5]
where age is a simple nested case class.
My question
Because the documentation related to this topic is very sparse can you
steer me in the right direction
- how to use the refactored interfaces in Spark3
- what is possible with the current interfaces. E.g. createWriter from
DataWriterFactory returns only DataWriter<InternalRow> whereas
createDataWriter in DataWriterFactory-v2 returned DataWriter<T> - which
makes it difficult to implement more complex datatypes
Thanks for any hints
Günter
--
Günter Hipler
University library Leipzig
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org