You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "guenterh.lists" <gu...@bluewin.ch> on 2022/11/28 16:16:58 UTC

Implement custom datasource (writer) for Spark3

Dear list,

I'm trying to implement my own custom datasource writer for spark 3 to 
serialize DataSets to an external storage (in my case RDF Triple ).

After reading various resources (books, articles, internet) I learned 
that the implementation changed from Spark1 via datasources v2 in 
Spark2  and was again changed in Spark3.

I was able to implement a DataFrame writer and could get a first result 
for a DataSet with a simple case class.

I'm getting an exception with a more complex (nested) case class.

   Caused by: org.apache.spark.sql.AnalysisException: unresolved 
operator 'AppendData RelationV2[name#13, vorname#14, age#15] class 
RDFTable, true;;
'AppendData RelationV2[name#13, vorname#14, age#15] class RDFTable, true
+- LocalRelation [name#3, vorname#4, age#5]

where age is a simple nested case class.

My question
Because the documentation related to this topic is very sparse can you 
steer me in the right direction
- how to use the refactored interfaces in Spark3
- what is possible with the current interfaces. E.g.  createWriter from 
DataWriterFactory returns only DataWriter<InternalRow> whereas 
createDataWriter in DataWriterFactory-v2 returned DataWriter<T> - which 
makes it difficult to implement more complex datatypes

Thanks for any hints

Günter


-- 
Günter Hipler
University library Leipzig


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org