You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by infa elance <in...@gmail.com> on 2019/07/11 17:19:55 UTC

Spark Newbie question

This is stand-alone spark cluster. My understanding is spark is an
execution engine and not a storage layer.
Spark processes data in memory but when someone refers to a spark table
created through sparksql(df/rdd) what exactly are they referring to?

Could it be a Hive table? If yes, is it the same hive store that spark uses?
Is it a table in memory? If yes, how can an external app

Spark version with hadoop : spark-2.0.2-bin-hadoop2.7

Thanks and appreciate your help!!
Ajay.

Re: Spark Newbie question

Posted by infa elance <in...@gmail.com>.

Thanks Jerry for the clarification.

Ajay.


On Thu, Jul 11, 2019 at 12:48 PM Jerry Vinokurov <gr...@gmail.com>
wrote:

> Hi Ajay,
>
> When a Spark SQL statement references a table, that table has to be
> "registered" first. Usually the way this is done is by reading in a
> DataFrame, then calling the createOrReplaceTempView (or one of a few other
> functions) on that data frame, with the argument being the name under which
> you'd like to register that table. You can then use the table in SQL
> statements. As far as I know, you cannot directly refer to any external
> data store without reading it in first.
>
> Jerry
>
> On Thu, Jul 11, 2019 at 1:27 PM infa elance <in...@gmail.com> wrote:
>
>> Sorry, i guess i hit the send button too soon....
>>
>> This question is regarding a spark stand-alone cluster. My understanding
>> is spark is an execution engine and not a storage layer.
>> Spark processes data in memory but when someone refers to a spark table
>> created through sparksql(df/rdd) what exactly are they referring to?
>>
>> Could it be a Hive table? If yes, is it the same hive store that spark
>> uses?
>> Is it a table in memory? If yes, how can an external app access this
>> in-memory table? if JDBC what driver ?
>>
>> On a databricks cluster -- could they be referring spark table created
>> through sparksql(df/rdd) as hive or deltalake table?
>>
>> Spark version with hadoop : spark-2.0.2-bin-hadoop2.7
>>
>> Thanks and appreciate your help!!
>> Ajay.
>>
>>
>>
>> On Thu, Jul 11, 2019 at 12:19 PM infa elance <in...@gmail.com>
>> wrote:
>>
>>> This is stand-alone spark cluster. My understanding is spark is an
>>> execution engine and not a storage layer.
>>> Spark processes data in memory but when someone refers to a spark table
>>> created through sparksql(df/rdd) what exactly are they referring to?
>>>
>>> Could it be a Hive table? If yes, is it the same hive store that spark
>>> uses?
>>> Is it a table in memory? If yes, how can an external app
>>>
>>> Spark version with hadoop : spark-2.0.2-bin-hadoop2.7
>>>
>>> Thanks and appreciate your help!!
>>> Ajay.
>>>
>>
>
> --
> http://www.google.com/profiles/grapesmoker
>

Re: Spark Newbie question

Posted by Jerry Vinokurov <gr...@gmail.com>.

Hi Ajay,

When a Spark SQL statement references a table, that table has to be
"registered" first. Usually the way this is done is by reading in a
DataFrame, then calling the createOrReplaceTempView (or one of a few other
functions) on that data frame, with the argument being the name under which
you'd like to register that table. You can then use the table in SQL
statements. As far as I know, you cannot directly refer to any external
data store without reading it in first.

Jerry

On Thu, Jul 11, 2019 at 1:27 PM infa elance <in...@gmail.com> wrote:

> Sorry, i guess i hit the send button too soon....
>
> This question is regarding a spark stand-alone cluster. My understanding
> is spark is an execution engine and not a storage layer.
> Spark processes data in memory but when someone refers to a spark table
> created through sparksql(df/rdd) what exactly are they referring to?
>
> Could it be a Hive table? If yes, is it the same hive store that spark
> uses?
> Is it a table in memory? If yes, how can an external app access this
> in-memory table? if JDBC what driver ?
>
> On a databricks cluster -- could they be referring spark table created
> through sparksql(df/rdd) as hive or deltalake table?
>
> Spark version with hadoop : spark-2.0.2-bin-hadoop2.7
>
> Thanks and appreciate your help!!
> Ajay.
>
>
>
> On Thu, Jul 11, 2019 at 12:19 PM infa elance <in...@gmail.com>
> wrote:
>
>> This is stand-alone spark cluster. My understanding is spark is an
>> execution engine and not a storage layer.
>> Spark processes data in memory but when someone refers to a spark table
>> created through sparksql(df/rdd) what exactly are they referring to?
>>
>> Could it be a Hive table? If yes, is it the same hive store that spark
>> uses?
>> Is it a table in memory? If yes, how can an external app
>>
>> Spark version with hadoop : spark-2.0.2-bin-hadoop2.7
>>
>> Thanks and appreciate your help!!
>> Ajay.
>>
>

-- 
http://www.google.com/profiles/grapesmoker

Re: Spark Newbie question

Posted by infa elance <in...@gmail.com>.

Sorry, i guess i hit the send button too soon....

This question is regarding a spark stand-alone cluster. My understanding is
spark is an execution engine and not a storage layer.
Spark processes data in memory but when someone refers to a spark table
created through sparksql(df/rdd) what exactly are they referring to?

Could it be a Hive table? If yes, is it the same hive store that spark uses?
Is it a table in memory? If yes, how can an external app access this
in-memory table? if JDBC what driver ?

On a databricks cluster -- could they be referring spark table created
through sparksql(df/rdd) as hive or deltalake table?

Spark version with hadoop : spark-2.0.2-bin-hadoop2.7

Thanks and appreciate your help!!
Ajay.

On Thu, Jul 11, 2019 at 12:19 PM infa elance <in...@gmail.com> wrote:

> This is stand-alone spark cluster. My understanding is spark is an
> execution engine and not a storage layer.
> Spark processes data in memory but when someone refers to a spark table
> created through sparksql(df/rdd) what exactly are they referring to?
>
> Could it be a Hive table? If yes, is it the same hive store that spark
> uses?
> Is it a table in memory? If yes, how can an external app
>
> Spark version with hadoop : spark-2.0.2-bin-hadoop2.7
>
> Thanks and appreciate your help!!
> Ajay.
>