You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Slava (JIRA)" <ji...@apache.org> on 2016/07/05 10:36:10 UTC
[jira] [Updated] (SPARK-16378) HiveContext doesn't release resources

     [ https://issues.apache.org/jira/browse/SPARK-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Slava updated SPARK-16378:
--------------------------
    Description: 
I am running this simple code:
HiveContext hiveContext = new HiveContext(new JavaSparkContext(conf));
hiveContext.sparkContext().stop();
Each HiveContext creation creates 100+ .dat files.
They could be counted by running "ls -l | grep dat | wc -l" and listed with "ls -l | grep dat" commands in /proc/PID/fd directory:

lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 891 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c650.dat
lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 893 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c670.dat
lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 895 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c690.dat

In my application I use "short living " context. I create it and stop repeatedly.
It seems that stopping the SparkContext doesn't stop the HiveContext. So these files (and it seems other resources) aren't released (deleted). HiveContext itself doesn't have stop method.
Thus next time I create the context, it creates another 100+ files. Finally I am running out of max open file descriptors and getting "Too many open files" error that eventually leads to the server crash.

  was:
I am running this simple code:
HiveContext hiveContext = new HiveContext(new JavaSparkContext(conf));
hiveContext.sparkContext().stop();
Each HiveContext creation creates 100+ .dat files.
They could be counted by running "ls -l | grep dat | wc -l" and listed with "ls -l | grep dat" commands in /proc/PID/fd directory:

lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 891 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c650.dat
lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 893 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c670.dat
lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 895 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c690.dat

In my application I use "short living " context. I create it and stop repeatedly.
It seems that stopping the SparkContext doesn't stop the HiveContext. So these files (and it seems other resources) don't released (deleted). HiveContext itself doesn't have stop method.
Thus next time I create the context, it creates another 100+ files. Finally I am running out of max open file descriptors and getting "Too many open files" error that eventually leads to the server crash.


> HiveContext doesn't release resources
> -------------------------------------
>
>                 Key: SPARK-16378
>                 URL: https://issues.apache.org/jira/browse/SPARK-16378
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API, SQL
>    Affects Versions: 1.6.0
>         Environment: Linux Ubuntu
>            Reporter: Slava
>            Priority: Critical
>
> I am running this simple code:
> HiveContext hiveContext = new HiveContext(new JavaSparkContext(conf));
> hiveContext.sparkContext().stop();
> Each HiveContext creation creates 100+ .dat files.
> They could be counted by running "ls -l | grep dat | wc -l" and listed with "ls -l | grep dat" commands in /proc/PID/fd directory:
> lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 891 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c650.dat
> lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 893 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c670.dat
> lrwx------ 1 dropwizard dropwizard 64 Jul  4 21:39 895 -> /tmp/spark-3625050e-6d18-421f-89ae-9859e9edfb9f/metastore/seg0/c690.dat
> In my application I use "short living " context. I create it and stop repeatedly.
> It seems that stopping the SparkContext doesn't stop the HiveContext. So these files (and it seems other resources) aren't released (deleted). HiveContext itself doesn't have stop method.
> Thus next time I create the context, it creates another 100+ files. Finally I am running out of max open file descriptors and getting "Too many open files" error that eventually leads to the server crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org