You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by siva kumar <si...@gmail.com> on 2016/02/10 13:34:07 UTC

Spark streaming

Hi,
       I'm pulling some twitter data and trying to save the data into
persistent table.This is the code written.

case class Tweet(createdAt:Long, text:String)
twt.map(status=>
  Tweet(status.getCreatedAt().getTime()/1000, status.getText())
).foreachRDD(rdd=>
 rdd.toDF().saveAsTable("stream",SaveMode.Append)
)
When I go to spark-sql an check , i can see the table created. When im
trying to retrieve data im getting below error.


* java.lang.RuntimeException:
file:/user/hive/warehouse/stream/_temporary/0/_temporary/attempt_201602101609_0383_r_000014_0/part-r-00664.parquet
is not a Parquet file (too small)*

Is this the correct way to store the streaming data into a persistent table?

Any help?
Thanks in Advance
Siva.