You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/12/18 20:28:46 UTC
[jira] [Commented] (SPARK-12427) spark builds filling up jenkins' disk

    [ https://issues.apache.org/jira/browse/SPARK-12427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064568#comment-15064568 ] 

Sean Owen commented on SPARK-12427:
-----------------------------------

I doubt we really need build history for more than a week or two. Does reducing to 2 weeks help enough to keep out of trouble for a while?

If the next major release in 2.0, and it drops support for most old Hadoop variations, at least we have no more separate pre/post YARN builds.

> spark builds filling up jenkins' disk
> -------------------------------------
>
>                 Key: SPARK-12427
>                 URL: https://issues.apache.org/jira/browse/SPARK-12427
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>            Reporter: shane knapp
>            Priority: Critical
>              Labels: build, jenkins
>         Attachments: graph.png, jenkins_disk_usage.txt
>
>
> problem summary:
> a few spark builds are filling up the jenkins master's disk with millions of little log files as build artifacts.  
> currently, we have a raid10 array set up with 5.4T of storage.  we're currently using 4.0T, 99.9% of which is spark unit test and junit logs.
> the worst offenders, with more than 100G of disk usage per job, are:
> 193G    ./Spark-1.6-Maven-with-YARN
> 194G    ./Spark-1.5-Maven-with-YARN
> 205G    ./Spark-1.6-Maven-pre-YARN
> 216G    ./Spark-1.5-Maven-pre-YARN
> 387G    ./Spark-Master-Maven-with-YARN
> 420G    ./Spark-Master-Maven-pre-YARN
> 520G    ./Spark-1.6-SBT
> 733G    ./Spark-1.5-SBT
> 812G    ./Spark-Master-SBT
> i have attached a full report w/all builds listed as well.
> each of these builds is keeping their build history for 90 days.
> keep in mind that for each new matrix build, we're looking at another 200-500G per for the SBT/pre-YARN/with-YARN jobs.
> a straw man, back of napkin estimate for spark 1.7 is 2T of additional disk usage.
> on the hardware config side, we can move from raid10 to raid 5 and get ~3T additional storage.  if we ditch raid altogether and put in bigger disks, we can get a total of 16-20T storage on master.  another option is to have a NFS mount to a deep storage server.  all of these options will require significant downtime.
> quesitons:
> * can we lower the number of days that we keep build information?
> * there are other options in jenkins that we can set as well:  max number of builds to keep, max # days to keep artifacts, max # of builds to keep w/artifacts
> * can we make the junit and unit test logs smaller (probably not)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org