You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Arpan Bhandari <ar...@gmail.com> on 2021/01/29 07:28:10 UTC

Spark SQL query

Hi ,

Is there a way to track back spark sql after it has been already run i.e.
query has been already submitted by a person and i have to back trace what
query actually got submitted.


Appreciate any help on this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Sanchit,

It seems I have to do some sort of analysis from the plan to get the query.
Appreciate all your help on this.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Sachit Murarka <co...@gmail.com>.

Application wise it wont show as such.
You can try to corelate it with explain plain output using some filters or
attribute.

Or else if you do not have too much queries in history. Just take queries
and find plan of those queries and match it with shown in UI.

I know thats the tedious task. But I dont think that there is other way.

Thanks
Sachit

On Mon, 1 Feb 2021, 22:32 Arpan Bhandari, <ar...@gmail.com> wrote:

> Sachit,
>
> That is showing all the queries that got executed, but how it would get
> mapped to specific application Id it was associated with ?
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Sachit,

That is showing all the queries that got executed, but how it would get
mapped to specific application Id it was associated with ?

Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Sachit Murarka <co...@gmail.com>.

Hi arpan,

In spark shell when you type
:history.
then also it is not showing?

Thanks
Sachit

On Mon, 1 Feb 2021, 21:13 Arpan Bhandari, <ar...@gmail.com> wrote:

> Hey Sachit,
>
> It shows the query plan, which is difficult to diagnose out and depict the
> actual query.
>
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Hey Sachit,

It shows the query plan, which is difficult to diagnose out and depict the
actual query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Sachit Murarka <co...@gmail.com>.

  Hi Arpan,

Launch spark shell and in the shell type ":history" , you will see the
query executed.

In the Spark UI under SQL Tab you can see the query plan when you click on
the details button(Though it won't show you the complete query). But by
looking at the plan you can get your query.

Hope this helps!

Kind Regards,
Sachit Murarka

On Fri, Jan 29, 2021 at 9:33 PM Arpan Bhandari <ar...@gmail.com> wrote:

> Hi Sachit,
>
> Yes it was executed using spark shell, history is already enabled. already
> checked sql tab but it is not showing the query. My spark version is 2.4.5
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Hi Sachit,

Yes it was executed using spark shell, history is already enabled. already
checked sql tab but it is not showing the query. My spark version is 2.4.5

Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Sachit Murarka <co...@gmail.com>.

Hi Arpan,

Was it executed using spark shell?
If yes type :history

Do u have history server enabled?
If yes , go to the history and go to the SQL tab in History UI.

Thanks
Sachit

On Fri, 29 Jan 2021, 19:19 Arpan Bhandari, <ar...@gmail.com> wrote:

> Hi ,
>
> Is there a way to track back spark sql after it has been already run i.e.
> query has been already submitted by a person and i have to back trace what
> query actually got submitted.
>
>
> Appreciate any help on this.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

I suggest one thing you can do is  to open another thread for this feature
request

"Having functionality in Spark to allow queries to be gathered and analyzed"

and see what forum  responds to it.

HTH

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Wed, 3 Feb 2021 at 11:17, Arpan Bhandari <ar...@gmail.com> wrote:

> Yes Mich,
>
> Mapping the spark sql query that got executed corresponding to an
> application Id on yarn would greatly help in analyzing and debugging the
> query for any potential problems.
>
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Yes Mich,

Mapping the spark sql query that got executed corresponding to an
application Id on yarn would greatly help in analyzing and debugging the
query for any potential problems.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

I gather what you are after is a code sniffer for Spark that provides a
form of GUI to get the code that applications run against spark.

I don't think Spark has this type of plug-in although it would be
potentially useful. Some RDBMS provide this. Usually stored on some form of
persistent storage or database. I have not come across it in Spark.

HTH

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Wed, 3 Feb 2021 at 05:10, Arpan Bhandari <ar...@gmail.com> wrote:

> Mich,
>
> The directory is already there and event logs are getting generated, I have
> checked them it contains the query plan but not the actual query.
>
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Mich,

The directory is already there and event logs are getting generated, I have
checked them it contains the query plan but not the actual query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

create a directory in hdfs

hdfs dfs -mkdir /spark_event_logs

modify file $SPARK_HOME/conf/spark-defaults.conf and add these two lines

spark.eventLog.enabled=true
# do not use quotes below
spark.eventLog.dir=hdfs://rhes75:9000/spark_event_logs

Then run a job and check it

hdfs dfs -ls /spark_event_logs

-rw-rw----   3 hduser supergroup   33795834 2021-02-02 19:48
/spark_event_logs/yarn-1612295234284

That should have all the info you need

Make sure the directory hdfs://<NAME_NODE>:9000/spark_event_logs is
writable by spark


HTH




LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 2 Feb 2021 at 15:59, Arpan Bhandari <ar...@gmail.com> wrote:

> Yes i can see the jobs on 8088 and also on the spark history url. spark
> history server is showing up the plan details on the sql tab but not giving
> the query.
>
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Yes i can see the jobs on 8088 and also on the spark history url. spark
history server is showing up the plan details on the sql tab but not giving
the query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

Ok

on host starting the job on port* 8088*, do you have access to all
applications like shown in the attached file. If you look at history can
you see the jobs?

Also if you go to history next to Tracking URL: History

HTH

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Tue, 2 Feb 2021 at 14:47, Arpan Bhandari <ar...@gmail.com> wrote:

> Hi Mich,
>
> I do see the .scala_history directory, but it contains all the queries
> which
> got executed uptill now, but if i have to map a specific query to an
> application Id in yarn that would not correlate, hence this method alone
> won't suffice
>
> Thanks,
> Arpan Bhandari
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Hi Mich,

I do see the .scala_history directory, but it contains all the queries which
got executed uptill now, but if i have to map a specific query to an
application Id in yarn that would not correlate, hence this method alone
won't suffice

Thanks,
Arpan Bhandari
 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Arpan.

I believe all applications including spark and scala create a hidden
history file

You can go to home directory

cd

# see list of all hidden files

ls -a | egrep '^\.'

If you are using scala do you see .scala_history file?

.scala_history

HTH

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Tue, 2 Feb 2021 at 10:16, Arpan Bhandari <ar...@gmail.com> wrote:

> Hi Mich,
>
> Repeated the steps as suggested, but still there is no such folder created
> in the home directory. Do we need to enable some property so that it
> creates
> one.
>
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Hi Mich,

Repeated the steps as suggested, but still there is no such folder created
in the home directory. Do we need to enable some property so that it creates
one.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Arpan,

log in as any user that has execution right for spark. type spark-shell, do
some simple commands then exit. go to home directory of that user and look
for that hidden file

${HOME/.spark_history

it will be there.

HTH,

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Mon, 1 Feb 2021 at 15:44, Arpan Bhandari <ar...@gmail.com> wrote:

> Hey Mich,
>
> Thanks for the suggestions, but i don't see any such folder created on the
> edge node.
>
>
> Thanks,
> Arpan Bhandari
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark SQL query

Posted by Arpan Bhandari <ar...@gmail.com>.

Hey Mich,

Thanks for the suggestions, but i don't see any such folder created on the
edge node.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Arpan,

I presume you are interested in what client was doing.

If you have access to the edge node (where spark code is submitted), look
for the following file

${HOME/.spark_history

example

-rw-r--r--. 1 hduser hadoop 111997 Jun  2  2018 .spark_history

just use shell tools (cat, grep etc) to have a look

Or put it in HDFS somewhere

hdfs dfs -put .spark_history /misc/spark_history ## Spark cannot read a
hidden file

#and read it as text file through sparkRDD in spark-shell

scala> val historyRDD = spark.sparkContext.textFile("/misc/spark_history")
historyRDD: org.apache.spark.rdd.RDD[String] = /misc/spark_history
MapPartitionsRDD[11] at textFile at <console>:23

#print it out

 historyRDD.collect().foreach(f=>{println(f)})


HTH





LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 29 Jan 2021 at 13:49, Arpan Bhandari <ar...@gmail.com> wrote:

> Hi ,
>
> Is there a way to track back spark sql after it has been already run i.e.
> query has been already submitted by a person and i have to back trace what
> query actually got submitted.
>
>
> Appreciate any help on this.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>