You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/09/08 17:27:26 UTC

Posting selected rows of Spark streaming data to Hive table

Hi,

Within spark streaming I have identified the data that I want to persist to
a Hive table. Table is already created.

These are the values for columns extracted

         for(line <- pricesRDD.collect.toArray)
         {
           var index = line._2.split(',').view(0).toInt
           var timestamp = line._2.split(',').view(1).toString
           var security =  line._2.split(',').view(2).toString
           var price = line._2.split(',').view(3).toFloat
           if (price > 90.0)
           {
 // post them to table as a row
// Like this
             sqltext = """
             INSERT INTO TABLE test.prices
             SELECT
                , index
                , timestamp
                , security
                , price
             """
             HiveContext.sql(sqltext)

The issue is that table is relational so I need to create a DF from the
values, possibly into an array and then to DF, create a tempTable and do
INSERT/SELECT from that table to Hive table.

In Spark streaming time is the essence so to start


   1. is there anyway one can do this through Array and DF and tempTable
   just to see it works
   2. Is there another way to do it faster.

Thanks







Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.