You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/09/08 17:27:26 UTC
Posting selected rows of Spark streaming data to Hive table
Hi,
Within spark streaming I have identified the data that I want to persist to
a Hive table. Table is already created.
These are the values for columns extracted
for(line <- pricesRDD.collect.toArray)
{
var index = line._2.split(',').view(0).toInt
var timestamp = line._2.split(',').view(1).toString
var security = line._2.split(',').view(2).toString
var price = line._2.split(',').view(3).toFloat
if (price > 90.0)
{
// post them to table as a row
// Like this
sqltext = """
INSERT INTO TABLE test.prices
SELECT
, index
, timestamp
, security
, price
"""
HiveContext.sql(sqltext)
The issue is that table is relational so I need to create a DF from the
values, possibly into an array and then to DF, create a tempTable and do
INSERT/SELECT from that table to Hive table.
In Spark streaming time is the essence so to start
1. is there anyway one can do this through Array and DF and tempTable
just to see it works
2. Is there another way to do it faster.
Thanks
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.