You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by anudeep <an...@gmail.com> on 2018/04/17 02:41:18 UTC

pyspark execution

Hi All,

I have a python file which I am executing directly with spark-submit
command.

Inside the python file, I have sql written using hive context.I created a
generic variable for the  database name inside sql

The problem is : How can I pass the value for this variable dynamically
just as we give in hive like --hivevar parameter.

Thanks!
Anudeep

Re: pyspark execution

Posted by hemant singh <he...@gmail.com>.

If it contains only SQL then you can use a function as below -

import subprocess

def run_sql(sql_file_path, your_db_name ,location):

subprocess.call(["spark-sql","-S","--hivevar","<DBName>",<your_db_name>,"--hivevar","LOCATION",location,"-f",sql_file_path])

In you have other pieces like spark code and not only sql in that file-

Write a parse function which parse you sql and replace the placeholders
like DB Name etc in your sql and then execute the new formed sql.

Maintaining your sql in a separate file though de-couples the code and sql
and make it easier from maintenance perspective.

On Tue, Apr 17, 2018 at 8:11 AM, anudeep <an...@gmail.com> wrote:

> Hi All,
>
> I have a python file which I am executing directly with spark-submit
> command.
>
> Inside the python file, I have sql written using hive context.I created a
> generic variable for the  database name inside sql
>
> The problem is : How can I pass the value for this variable dynamically
> just as we give in hive like --hivevar parameter.
>
> Thanks!
> Anudeep
>
>
>
>
>
>
>