You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/12/15 10:33:41 UTC

Archiving XML test reports for analysis

Every time we run a test cycle on our Jenkins cluster, we generate hundreds
of XML reports covering all the tests we have (e.g.
`streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`).

These reports contain interesting information about whether tests succeeded
or failed, and how long they took to complete. There is also detailed
information about the environment they ran in.

It might be valuable to have a window into all these reports across all
Jenkins builds and across all time, and use that to track basic statistics
about our tests. That could give us basic insight into what tests are flaky
or slow, and perhaps drive other improvements to our testing infrastructure
that we can't see just yet.

Do people think that would be valuable? Do we already have something like
this?

I'm thinking for starters it might be cool if we automatically uploaded all
the XML test reports from the Master and the Pull Request builders to an S3
bucket and just opened it up for the dev community to analyze.

Nick

Re: Archiving XML test reports for analysis

Posted by shane knapp <sk...@berkeley.edu>.
i have no problem w/storing all of the logs.  :)

i also have no problem w/donated S3 buckets.  :)

On Mon, Dec 15, 2014 at 2:39 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:
>
> How about all of them <https://amplab.cs.berkeley.edu/jenkins/view/Spark/>? How
> much data per day would it roughly be if we uploaded all the logs for all
> these builds?
>
> Also, would Databricks be willing to offer up an S3 bucket for this
> purpose?
>
> Nick
>
> On Mon Dec 15 2014 at 11:48:44 AM shane knapp <sk...@berkeley.edu> wrote:
>
>> right now, the following logs are archived on to the master:
>>
>>   local log_files=$(
>>     find .\
>>       -name "unit-tests.log" -o\
>>       -path "./sql/hive/target/HiveCompatibilitySuite.failed" -o\
>>       -path "./sql/hive/target/HiveCompatibilitySuite.hiveFailed" -o\
>>       -path "./sql/hive/target/HiveCompatibilitySuite.wrong"
>>   )
>>
>> regarding dumping stuff to S3 -- thankfully, since we're not looking at a
>> lot of disk usage, i don't see a problem w/this.  we could tar/zip up the
>> XML for each build and just dump it there.
>>
>> what builds are we thinking about?  spark pull request builder?  what
>> others?
>>
>> On Mon, Dec 15, 2014 at 1:33 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>>
>>> Every time we run a test cycle on our Jenkins cluster, we generate
>>> hundreds
>>> of XML reports covering all the tests we have (e.g.
>>>
>>> `streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`).
>>>
>>> These reports contain interesting information about whether tests
>>> succeeded
>>> or failed, and how long they took to complete. There is also detailed
>>> information about the environment they ran in.
>>>
>>> It might be valuable to have a window into all these reports across all
>>> Jenkins builds and across all time, and use that to track basic
>>> statistics
>>> about our tests. That could give us basic insight into what tests are
>>> flaky
>>> or slow, and perhaps drive other improvements to our testing
>>> infrastructure
>>> that we can't see just yet.
>>>
>>> Do people think that would be valuable? Do we already have something like
>>> this?
>>>
>>> I'm thinking for starters it might be cool if we automatically uploaded
>>> all
>>> the XML test reports from the Master and the Pull Request builders to an
>>> S3
>>> bucket and just opened it up for the dev community to analyze.
>>>
>>> Nick
>>>
>>

Re: Archiving XML test reports for analysis

Posted by Nicholas Chammas <ni...@gmail.com>.
How about all of them <https://amplab.cs.berkeley.edu/jenkins/view/Spark/>? How
much data per day would it roughly be if we uploaded all the logs for all
these builds?

Also, would Databricks be willing to offer up an S3 bucket for this purpose?

Nick

On Mon Dec 15 2014 at 11:48:44 AM shane knapp <sk...@berkeley.edu> wrote:

> right now, the following logs are archived on to the master:
>
>   local log_files=$(
>     find .\
>       -name "unit-tests.log" -o\
>       -path "./sql/hive/target/HiveCompatibilitySuite.failed" -o\
>       -path "./sql/hive/target/HiveCompatibilitySuite.hiveFailed" -o\
>       -path "./sql/hive/target/HiveCompatibilitySuite.wrong"
>   )
>
> regarding dumping stuff to S3 -- thankfully, since we're not looking at a
> lot of disk usage, i don't see a problem w/this.  we could tar/zip up the
> XML for each build and just dump it there.
>
> what builds are we thinking about?  spark pull request builder?  what
> others?
>
> On Mon, Dec 15, 2014 at 1:33 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>>
>> Every time we run a test cycle on our Jenkins cluster, we generate
>> hundreds
>> of XML reports covering all the tests we have (e.g.
>>
>> `streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`).
>>
>> These reports contain interesting information about whether tests
>> succeeded
>> or failed, and how long they took to complete. There is also detailed
>> information about the environment they ran in.
>>
>> It might be valuable to have a window into all these reports across all
>> Jenkins builds and across all time, and use that to track basic statistics
>> about our tests. That could give us basic insight into what tests are
>> flaky
>> or slow, and perhaps drive other improvements to our testing
>> infrastructure
>> that we can't see just yet.
>>
>> Do people think that would be valuable? Do we already have something like
>> this?
>>
>> I'm thinking for starters it might be cool if we automatically uploaded
>> all
>> the XML test reports from the Master and the Pull Request builders to an
>> S3
>> bucket and just opened it up for the dev community to analyze.
>>
>> Nick
>>
>

Re: Archiving XML test reports for analysis

Posted by shane knapp <sk...@berkeley.edu>.
right now, the following logs are archived on to the master:

  local log_files=$(
    find .\
      -name "unit-tests.log" -o\
      -path "./sql/hive/target/HiveCompatibilitySuite.failed" -o\
      -path "./sql/hive/target/HiveCompatibilitySuite.hiveFailed" -o\
      -path "./sql/hive/target/HiveCompatibilitySuite.wrong"
  )

regarding dumping stuff to S3 -- thankfully, since we're not looking at a
lot of disk usage, i don't see a problem w/this.  we could tar/zip up the
XML for each build and just dump it there.

what builds are we thinking about?  spark pull request builder?  what
others?

On Mon, Dec 15, 2014 at 1:33 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:
>
> Every time we run a test cycle on our Jenkins cluster, we generate hundreds
> of XML reports covering all the tests we have (e.g.
>
> `streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`).
>
> These reports contain interesting information about whether tests succeeded
> or failed, and how long they took to complete. There is also detailed
> information about the environment they ran in.
>
> It might be valuable to have a window into all these reports across all
> Jenkins builds and across all time, and use that to track basic statistics
> about our tests. That could give us basic insight into what tests are flaky
> or slow, and perhaps drive other improvements to our testing infrastructure
> that we can't see just yet.
>
> Do people think that would be valuable? Do we already have something like
> this?
>
> I'm thinking for starters it might be cool if we automatically uploaded all
> the XML test reports from the Master and the Pull Request builders to an S3
> bucket and just opened it up for the dev community to analyze.
>
> Nick
>