You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jim Carroll <ji...@gmail.com> on 2014/09/04 15:18:34 UTC

Using Spark to add data to an existing Parquet file without a schema

Hello all,

I've been trying to figure out how to add data to an existing Parquet file
without having a schema. Spark has allowed me to load JSON and save it as a
Parquet file but I was wondering if anyone knows how to ADD/INSERT more
data. 

I tried using sql insert and that doesn't work. All of the examples assume a
schema exists in the form of a serialization IDL and generated classes.

I looked into the code and considered direct use of InsertIntoParquetTable
or a copy of it but I was hoping someone already solved the problem.

Any guidance would be greatly appreciated.

Thanks
Jim






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-add-data-to-an-existing-Parquet-file-without-a-schema-tp13450.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Using Spark to add data to an existing Parquet file without a schema

Posted by Jim Carroll <ji...@gmail.com>.

Okay,

Obviously I don't care about adding more files to the system so is there a
way to point to an existing parquet file (directory) and seed the individual
"part-r-***.parquet" (the value of "partition + offset") while preventing 

I mean, I can hack it by copying files into the same parquet directory and
managing the file names externally but this seems like a work around. Is
that the way others are doing it?

Jim




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-add-data-to-an-existing-Parquet-file-without-a-schema-tp13450p13499.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org