You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/12/14 08:31:50 UTC

Spark JIRA Report

What do y’all think of a report like this emailed out to the dev list on a
monthly basis?

The goal would be to increase visibility into our open issues and encourage
developers to tend to our issue tracker more frequently.

Nick

There are 1,236 unresolved issues
<https://issues.apache.org/jira/issues/?jql=project+%3D+SPARK+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC>
in the Spark project on JIRA.
Recently Updated Issues
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC>
Type Key Priority Summary Last Updated   Bug SPARK-4841
<https://issues.apache.org/jira/browse/SPARK-4841> Major Batch serializer
bug in PySpark’s RDD.zip Dec 14, 2014  Question SPARK-4810
<https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run
collect Dec 14, 2014  Bug SPARK-785
<https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner not
invoked on most PairRDDFunctions Dec 14, 2014  New Feature SPARK-3405
<https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster
creation on VPC Dec 13, 2014  Improvement SPARK-1555
<https://issues.apache.org/jira/browse/SPARK-1555> Minor enable
ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13, 2014   Stale
Issues
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
Type Key Priority Summary Last Updated   Bug SPARK-560
<https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs /
iterators Oct 22, 2012  New Feature SPARK-540
<https://issues.apache.org/jira/browse/SPARK-540> None Add API to customize
in-memory representation of RDDs Oct 22, 2012  Improvement SPARK-573
<https://issues.apache.org/jira/browse/SPARK-573> None Clarify semantics of
the parallelized closures Oct 22, 2012  New Feature SPARK-609
<https://issues.apache.org/jira/browse/SPARK-609> Minor Add instructions
for enabling Akka debug logging Nov 06, 2012  New Feature SPARK-636
<https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism to
run system management/configuration tasks on all workers Dec 17, 2012   Most
Watched Issues
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20watchers%20DESC>
Type Key Priority Summary Watchers   New Feature SPARK-3561
<https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for
pluggable execution contexts in Spark 75  New Feature SPARK-2365
<https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD, an
efficient updatable key-value store 33  Improvement SPARK-2044
<https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable
interface for shuffles 30  New Feature SPARK-1405
<https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel Latent
Dirichlet Allocation (LDA) atop of spark in MLlib 26  New Feature SPARK-1406
<https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model
evaluation support via MLib 21   Most Voted Issues
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20votes%20DESC>
Type Key Priority Summary Votes   Bug SPARK-2541
<https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone mode
can’t access secure HDFS anymore 12  New Feature SPARK-2365
<https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD, an
efficient updatable key-value store 9  Improvement SPARK-3533
<https://issues.apache.org/jira/browse/SPARK-3533> Major Add
saveAsTextFileByKey() method to RDDs 8  Bug SPARK-2883
<https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark Support
for ORCFile format 6  New Feature SPARK-1442
<https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window
function support 6

Re: Fwd: Spark JIRA Report

Posted by Josh Rosen <ro...@gmail.com>.

Slightly off-topic, but or helping to clear the PR review backlog, I have a proposal to add some “PR lifecycle” tools to spark-prs.appspot.com to make it easier to track which PRs are blocked on reviewers vs. authors: https://github.com/databricks/spark-pr-dashboard/pull/39

On December 18, 2014 at 2:01:31 PM, Sean Owen (sowen@cloudera.com) wrote:

In practice, most issues with no activity for, say, 6+ months are  
dead. There's down-side in believing they will eventually get done by  
somebody, since they almost always don't.  

Most is clutter, but if there are important bugs among them, then the  
fact they're idling is a different problem: too much demand / not  
enough supply of attention, not saying 'no' to enough, fast enough,  
and so on.  

Sure you can prompt people to at least ping an issue they care about  
once every 6 months to keep it alive. Which is essentially the same  
as: Resolve and invite anyone who cares to Reopen. If nobody bothers,  
can it be important? If the problem is, well, nobody would really be  
paying attention to the prompts, that's this different problem again.  

So: I think the auto-Resolve idea, or an email blast, is at best a  
forcing mechanism to pay attention to a more fundamental issue. I  
myself am less interested in that than working on the causes of  
long-lived important stuff in a JIRA backlog.  

You can see regular process progress like auto-closing PRs,  
spark-prs.appspot.com, some big passes at closing stale issues. It's  
still my impression that the bulk of existing JIRA does not get  
reviewed, so there's more to do. For example, from a recent tour  
through the JIRA list, there were ~50 that were even definitively  
resolved, and not marked as such. It's not for lack of excellent  
effort. The pace of good change outstrips any other project I've seen  
by a wide margin, dwarfed only by unprecedented inbound load.  

I'd rather the conversation be about more attacks on the supply/demand  
problem, like adding committers to offload resolution of the easy and  
clear changes more rapidly, docs or tools to help contributors make  
better PRs/JIRAs in the first place, stating what is in and out of  
scope upfront to direct efforts, and so on. That's a different  
discussion from this one though.  

On Thu, Dec 18, 2014 at 8:07 PM, Josh Rosen <ro...@gmail.com> wrote:  
> I don’t think that it makes sense to just close inactive JIRA issue without any human review. There are many legitimate feature requests / bug reports that might be inactive for a long time because they’re low priorities to fix or because nobody has had time to deal with them yet.  
>  
> On December 15, 2014 at 2:37:30 PM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:  
>  
> OK, that's good.  
>  
> Another approach we can take to controlling the number of stale JIRA issues  
> is writing a bot that simply closes issues after N days of inactivity and  
> prompts people to reopen the issue if it's still valid. I believe Sean Owen  
> proposed that at one point (?).  
>  
> I wonder if that might be better since I feel that even a slimmed down  
> email might not be enough to get already-busy people to spend time on JIRA  
> management.  
>  
> Nick  
>  

---------------------------------------------------------------------  
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
For additional commands, e-mail: dev-help@spark.apache.org

Fwd: Spark JIRA Report

Posted by Sean Owen <so...@cloudera.com>.

In practice, most issues with no activity for, say, 6+ months are
dead. There's down-side in believing they will eventually get done by
somebody, since they almost always don't.

Most is clutter, but if there are important bugs among them, then the
fact they're idling is a different problem: too much demand / not
enough supply of attention, not saying 'no' to enough, fast enough,
and so on.

Sure you can prompt people to at least ping an issue they care about
once every 6 months to keep it alive. Which is essentially the same
as: Resolve and invite anyone who cares to Reopen. If nobody bothers,
can it be important? If the problem is, well, nobody would really be
paying attention to the prompts, that's this different problem again.

So: I think the auto-Resolve idea, or an email blast, is at best a
forcing mechanism to pay attention to a more fundamental issue. I
myself am less interested in that than working on the causes of
long-lived important stuff in a JIRA backlog.

You can see regular process progress like auto-closing PRs,
spark-prs.appspot.com, some big passes at closing stale issues. It's
still my impression that the bulk of existing JIRA does not get
reviewed, so there's more to do. For example, from a recent tour
through the JIRA list, there were ~50 that were even definitively
resolved, and not marked as such. It's not for lack of excellent
effort. The pace of good change outstrips any other project I've seen
by a wide margin, dwarfed only by unprecedented inbound load.

I'd rather the conversation be about more attacks on the supply/demand
problem, like adding committers to offload resolution of the easy and
clear changes more rapidly, docs or tools to help contributors make
better PRs/JIRAs in the first place, stating what is in and out of
scope upfront to direct efforts, and so on. That's a different
discussion from this one though.

On Thu, Dec 18, 2014 at 8:07 PM, Josh Rosen <ro...@gmail.com> wrote:
> I don’t think that it makes sense to just close inactive JIRA issue without any human review.  There are many legitimate feature requests / bug reports that might be inactive for a long time because they’re low priorities to fix or because nobody has had time to deal with them yet.
>
> On December 15, 2014 at 2:37:30 PM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:
>
> OK, that's good.
>
> Another approach we can take to controlling the number of stale JIRA issues
> is writing a bot that simply closes issues after N days of inactivity and
> prompts people to reopen the issue if it's still valid. I believe Sean Owen
> proposed that at one point (?).
>
> I wonder if that might be better since I feel that even a slimmed down
> email might not be enough to get already-busy people to spend time on JIRA
> management.
>
> Nick
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Spark JIRA Report

Posted by Josh Rosen <ro...@gmail.com>.

I don’t think that it makes sense to just close inactive JIRA issue without any human review.  There are many legitimate feature requests / bug reports that might be inactive for a long time because they’re low priorities to fix or because nobody has had time to deal with them yet.

On December 15, 2014 at 2:37:30 PM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:

OK, that's good.  

Another approach we can take to controlling the number of stale JIRA issues  
is writing a bot that simply closes issues after N days of inactivity and  
prompts people to reopen the issue if it's still valid. I believe Sean Owen  
proposed that at one point (?).  

I wonder if that might be better since I feel that even a slimmed down  
email might not be enough to get already-busy people to spend time on JIRA  
management.  

Nick  

On Mon Dec 15 2014 at 12:55:06 PM Andrew Ash <an...@andrewash.com> wrote:  

> Nick,  
>  
> Putting the N most stale issues into a report like your latest one does  
> seem like a good way to tackle the wall of text effect that I'm worried  
> about.  
>  
> On Sun, Dec 14, 2014 at 12:28 PM, Nicholas Chammas <  
> nicholas.chammas@gmail.com> wrote:  
>  
>> Taking after Andrew’s suggestion, perhaps the report can just focus on  
>> Stale issues (no updates in > 90 days), since those are probably the  
>> easiest to act on.  
>>  
>> For example:  
>> Stale Issues  
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>  
>>  
>> - [Oct 22, 2012] SPARK-560  
>> <https://issues.apache.org/jira/browse/SPARK-560>: Specialize RDDs /  
>> iterators  
>> - [Oct 22, 2012] SPARK-540  
>> <https://issues.apache.org/jira/browse/SPARK-540>: Add API to  
>> customize in-memory representation of RDDs  
>> - [Oct 22, 2012] SPARK-573  
>> <https://issues.apache.org/jira/browse/SPARK-573>: Clarify semantics  
>> of the parallelized closures  
>> - [Nov 06, 2012] SPARK-609  
>> <https://issues.apache.org/jira/browse/SPARK-609>: Add instructions  
>> for enabling Akka debug logging  
>> - [Dec 17, 2012] SPARK-636  
>> <https://issues.apache.org/jira/browse/SPARK-636>: Add mechanism to  
>> run system management/configuration tasks on all workers  
>>  
>> Andrew,  
>>  
>> Does that seem more useful?  
>>  
>> Nick  
>>   
>>  
>> On Sun Dec 14 2014 at 3:20:54 AM Nicholas Chammas <  
>> nicholas.chammas@gmail.com> wrote:  
>>  
>>> I formatted this report using Markdown; I'm open to changing the  
>>> structure or formatting or reducing the amount of information to make the  
>>> report more easily consumable.  
>>>  
>>> Regarding just sending links or whether this would just be mailing list  
>>> noise, those are a good questions.  
>>>  
>>> I've sent out links before, but I feel from a UX perspective having the  
>>> information right in the email itself makes it frictionless for people to  
>>> act on the information. For me, that difference is enough to hook me into  
>>> spending a few minutes on JIRA vs. just glossing over an email with a link.  
>>>  
>>> I wonder if that's also the case for others on this list.  
>>>  
>>> If you already spend a good amount of time cleaning up on JIRA, then  
>>> this report won't be that relevant to you. But given the number and growth  
>>> of open issues on our tracker, I suspect we could do with quite a few more  
>>> people chipping in and cleaning up where they can.  
>>>  
>>> That's the real problem that this report is intended to help with.  
>>>  
>>> Nick  
>>>  
>>>  
>>>  
>>> On Sun Dec 14 2014 at 2:49:00 AM Andrew Ash <an...@andrewash.com>  
>>> wrote:  
>>>  
>>>> The goal of increasing visibility on open issues is a good one. How is  
>>>> this different from just a link to Jira though? Some might say this adds  
>>>> noise to the mailing list and doesn't contain any information not already  
>>>> available in Jira.  
>>>>  
>>>> The idea seems good but the formatting leaves a little to be desired.  
>>>> If you aren't opposed to using HTML, I might suggest this more compact  
>>>> format:  
>>>>  
>>>> SPARK-2044 <https://issues.apache.org/jira/browse/SPARK-2044>  
>>>> Pluggable interface for shuffles  
>>>> SPARK-2365 <https://issues.apache.org/jira/browse/SPARK-2365> Add  
>>>> IndexedRDD, an efficient updatable key-value  
>>>> SPARK-3561 <https://issues.apache.org/jira/browse/SPARK-3561> Allow  
>>>> for pluggable execution contexts in Spark  
>>>>  
>>>> Andrew  
>>>>  
>>>> On Sat, Dec 13, 2014 at 11:31 PM, Nicholas Chammas <  
>>>> nicholas.chammas@gmail.com> wrote:  
>>>>  
>>>>> What do y’all think of a report like this emailed out to the dev list  
>>>>> on a  
>>>>> monthly basis?  
>>>>>  
>>>>> The goal would be to increase visibility into our open issues and  
>>>>> encourage  
>>>>> developers to tend to our issue tracker more frequently.  
>>>>>  
>>>>> Nick  
>>>>>  
>>>>> There are 1,236 unresolved issues  
>>>>>  
>>>> <https://issues.apache.org/jira/issues/?jql=project+%3D+SPAR  
>>>>> K+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC>  
>>>>  
>>>>  
>>>>> in the Spark project on JIRA.  
>>>>> Recently Updated Issues  
>>>>>  
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%  
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%  
>>>>> 20updated%20DESC>  
>>>>  
>>>>  
>>>>> Type Key Priority Summary Last Updated Bug SPARK-4841  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-4841> Major Batch  
>>>>> serializer  
>>>>  
>>>>  
>>>>> bug in PySpark’s RDD.zip Dec 14, 2014 Question SPARK-4810  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run  
>>>>  
>>>>  
>>>>> collect Dec 14, 2014 Bug SPARK-785  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner  
>>>>> not  
>>>>  
>>>>  
>>>>> invoked on most PairRDDFunctions Dec 14, 2014 New Feature SPARK-3405  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster  
>>>>  
>>>>  
>>>>> creation on VPC Dec 13, 2014 Improvement SPARK-1555  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-1555> Minor enable  
>>>>  
>>>>  
>>>>> ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13,  
>>>>> 2014 Stale  
>>>>> Issues  
>>>>>  
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%  
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20update  
>>>>> d%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>  
>>>>  
>>>>  
>>>>> Type Key Priority Summary Last Updated Bug SPARK-560  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs  
>>>>> /  
>>>>  
>>>>  
>>>>> iterators Oct 22, 2012 New Feature SPARK-540  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-540> None Add API to  
>>>>> customize  
>>>>  
>>>>  
>>>>> in-memory representation of RDDs Oct 22, 2012 Improvement SPARK-573  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-573> None Clarify  
>>>>> semantics of  
>>>>  
>>>>  
>>>>> the parallelized closures Oct 22, 2012 New Feature SPARK-609  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-609> Minor Add  
>>>>> instructions  
>>>>  
>>>>  
>>>>> for enabling Akka debug logging Nov 06, 2012 New Feature SPARK-636  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism  
>>>>> to  
>>>>  
>>>>  
>>>>> run system management/configuration tasks on all workers Dec 17, 2012  
>>>>> Most  
>>>>> Watched Issues  
>>>>>  
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%  
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%  
>>>>> 20watchers%20DESC>  
>>>>  
>>>>  
>>>>> Type Key Priority Summary Watchers New Feature SPARK-3561  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for  
>>>>  
>>>>  
>>>>> pluggable execution contexts in Spark 75 New Feature SPARK-2365  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add  
>>>>> IndexedRDD, an  
>>>>  
>>>>  
>>>>> efficient updatable key-value store 33 Improvement SPARK-2044  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable  
>>>>  
>>>>  
>>>>> interface for shuffles 30 New Feature SPARK-1405  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel  
>>>>> Latent  
>>>>  
>>>>  
>>>>> Dirichlet Allocation (LDA) atop of spark in MLlib 26 New Feature  
>>>>> SPARK-1406  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model  
>>>>  
>>>>  
>>>>> evaluation support via MLib 21 Most Voted Issues  
>>>>>  
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%  
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%  
>>>>> 20votes%20DESC>  
>>>>  
>>>>  
>>>>> Type Key Priority Summary Votes Bug SPARK-2541  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone  
>>>>> mode  
>>>>  
>>>>  
>>>>> can’t access secure HDFS anymore 12 New Feature SPARK-2365  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add  
>>>>> IndexedRDD, an  
>>>>  
>>>>  
>>>>> efficient updatable key-value store 9 Improvement SPARK-3533  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-3533> Major Add  
>>>>  
>>>>  
>>>>> saveAsTextFileByKey() method to RDDs 8 Bug SPARK-2883  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark  
>>>>> Support  
>>>>  
>>>>  
>>>>> for ORCFile format 6 New Feature SPARK-1442  
>>>>>  
>>>> <https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window  
>>>>> function support 6  
>>>>>   
>>>>>  
>>>>  
>>>>  
>

Re: Spark JIRA Report

Posted by Nicholas Chammas <ni...@gmail.com>.

OK, that's good.

Another approach we can take to controlling the number of stale JIRA issues
is writing a bot that simply closes issues after N days of inactivity and
prompts people to reopen the issue if it's still valid. I believe Sean Owen
proposed that at one point (?).

I wonder if that might be better since I feel that even a slimmed down
email might not be enough to get already-busy people to spend time on JIRA
management.

Nick

On Mon Dec 15 2014 at 12:55:06 PM Andrew Ash <an...@andrewash.com> wrote:

> Nick,
>
> Putting the N most stale issues into a report like your latest one does
> seem like a good way to tackle the wall of text effect that I'm worried
> about.
>
> On Sun, Dec 14, 2014 at 12:28 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Taking after Andrew’s suggestion, perhaps the report can just focus on
>> Stale issues (no updates in > 90 days), since those are probably the
>> easiest to act on.
>>
>> For example:
>> Stale Issues
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
>>
>>    - [Oct 22, 2012] SPARK-560
>>    <https://issues.apache.org/jira/browse/SPARK-560>: Specialize RDDs /
>>    iterators
>>    - [Oct 22, 2012] SPARK-540
>>    <https://issues.apache.org/jira/browse/SPARK-540>: Add API to
>>    customize in-memory representation of RDDs
>>    - [Oct 22, 2012] SPARK-573
>>    <https://issues.apache.org/jira/browse/SPARK-573>: Clarify semantics
>>    of the parallelized closures
>>    - [Nov 06, 2012] SPARK-609
>>    <https://issues.apache.org/jira/browse/SPARK-609>: Add instructions
>>    for enabling Akka debug logging
>>    - [Dec 17, 2012] SPARK-636
>>    <https://issues.apache.org/jira/browse/SPARK-636>: Add mechanism to
>>    run system management/configuration tasks on all workers
>>
>> Andrew,
>>
>> Does that seem more useful?
>>
>> Nick
>> 
>>
>> On Sun Dec 14 2014 at 3:20:54 AM Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> I formatted this report using Markdown; I'm open to changing the
>>> structure or formatting or reducing the amount of information to make the
>>> report more easily consumable.
>>>
>>> Regarding just sending links or whether this would just be mailing list
>>> noise, those are a good questions.
>>>
>>> I've sent out links before, but I feel from a UX perspective having the
>>> information right in the email itself makes it frictionless for people to
>>> act on the information. For me, that difference is enough to hook me into
>>> spending a few minutes on JIRA vs. just glossing over an email with a link.
>>>
>>> I wonder if that's also the case for others on this list.
>>>
>>> If you already spend a good amount of time cleaning up on JIRA, then
>>> this report won't be that relevant to you. But given the number and growth
>>> of open issues on our tracker, I suspect we could do with quite a few more
>>> people chipping in and cleaning up where they can.
>>>
>>> That's the real problem that this report is intended to help with.
>>>
>>> Nick
>>>
>>>
>>>
>>> On Sun Dec 14 2014 at 2:49:00 AM Andrew Ash <an...@andrewash.com>
>>> wrote:
>>>
>>>> The goal of increasing visibility on open issues is a good one.  How is
>>>> this different from just a link to Jira though?  Some might say this adds
>>>> noise to the mailing list and doesn't contain any information not already
>>>> available in Jira.
>>>>
>>>> The idea seems good but the formatting leaves a little to be desired.
>>>> If you aren't opposed to using HTML, I might suggest this more compact
>>>> format:
>>>>
>>>> SPARK-2044 <https://issues.apache.org/jira/browse/SPARK-2044>
>>>>  Pluggable interface for shuffles
>>>> SPARK-2365 <https://issues.apache.org/jira/browse/SPARK-2365> Add
>>>> IndexedRDD, an efficient updatable key-value
>>>> SPARK-3561 <https://issues.apache.org/jira/browse/SPARK-3561> Allow
>>>> for pluggable execution contexts in Spark
>>>>
>>>> Andrew
>>>>
>>>> On Sat, Dec 13, 2014 at 11:31 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> What do y’all think of a report like this emailed out to the dev list
>>>>> on a
>>>>> monthly basis?
>>>>>
>>>>> The goal would be to increase visibility into our open issues and
>>>>> encourage
>>>>> developers to tend to our issue tracker more frequently.
>>>>>
>>>>> Nick
>>>>>
>>>>> There are 1,236 unresolved issues
>>>>>
>>>> <https://issues.apache.org/jira/issues/?jql=project+%3D+SPAR
>>>>> K+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC>
>>>>
>>>>
>>>>> in the Spark project on JIRA.
>>>>> Recently Updated Issues
>>>>>
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>>>> 20updated%20DESC>
>>>>
>>>>
>>>>> Type Key Priority Summary Last Updated   Bug SPARK-4841
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-4841> Major Batch
>>>>> serializer
>>>>
>>>>
>>>>> bug in PySpark’s RDD.zip Dec 14, 2014  Question SPARK-4810
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run
>>>>
>>>>
>>>>> collect Dec 14, 2014  Bug SPARK-785
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner
>>>>> not
>>>>
>>>>
>>>>> invoked on most PairRDDFunctions Dec 14, 2014  New Feature SPARK-3405
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster
>>>>
>>>>
>>>>> creation on VPC Dec 13, 2014  Improvement SPARK-1555
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-1555> Minor enable
>>>>
>>>>
>>>>> ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13,
>>>>> 2014   Stale
>>>>> Issues
>>>>>
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20update
>>>>> d%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
>>>>
>>>>
>>>>> Type Key Priority Summary Last Updated   Bug SPARK-560
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs
>>>>> /
>>>>
>>>>
>>>>> iterators Oct 22, 2012  New Feature SPARK-540
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-540> None Add API to
>>>>> customize
>>>>
>>>>
>>>>> in-memory representation of RDDs Oct 22, 2012  Improvement SPARK-573
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-573> None Clarify
>>>>> semantics of
>>>>
>>>>
>>>>> the parallelized closures Oct 22, 2012  New Feature SPARK-609
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-609> Minor Add
>>>>> instructions
>>>>
>>>>
>>>>> for enabling Akka debug logging Nov 06, 2012  New Feature SPARK-636
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism
>>>>> to
>>>>
>>>>
>>>>> run system management/configuration tasks on all workers Dec 17, 2012
>>>>>  Most
>>>>> Watched Issues
>>>>>
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>>>> 20watchers%20DESC>
>>>>
>>>>
>>>>> Type Key Priority Summary Watchers   New Feature SPARK-3561
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for
>>>>
>>>>
>>>>> pluggable execution contexts in Spark 75  New Feature SPARK-2365
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add
>>>>> IndexedRDD, an
>>>>
>>>>
>>>>> efficient updatable key-value store 33  Improvement SPARK-2044
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable
>>>>
>>>>
>>>>> interface for shuffles 30  New Feature SPARK-1405
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel
>>>>> Latent
>>>>
>>>>
>>>>> Dirichlet Allocation (LDA) atop of spark in MLlib 26  New Feature
>>>>> SPARK-1406
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model
>>>>
>>>>
>>>>> evaluation support via MLib 21   Most Voted Issues
>>>>>
>>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>>>> 20votes%20DESC>
>>>>
>>>>
>>>>> Type Key Priority Summary Votes   Bug SPARK-2541
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone
>>>>> mode
>>>>
>>>>
>>>>> can’t access secure HDFS anymore 12  New Feature SPARK-2365
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add
>>>>> IndexedRDD, an
>>>>
>>>>
>>>>> efficient updatable key-value store 9  Improvement SPARK-3533
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-3533> Major Add
>>>>
>>>>
>>>>> saveAsTextFileByKey() method to RDDs 8  Bug SPARK-2883
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark
>>>>> Support
>>>>
>>>>
>>>>> for ORCFile format 6  New Feature SPARK-1442
>>>>>
>>>> <https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window
>>>>> function support 6
>>>>> 
>>>>>
>>>>
>>>>
>

Re: Spark JIRA Report

Posted by Andrew Ash <an...@andrewash.com>.

Nick,

Putting the N most stale issues into a report like your latest one does
seem like a good way to tackle the wall of text effect that I'm worried
about.

On Sun, Dec 14, 2014 at 12:28 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Taking after Andrew’s suggestion, perhaps the report can just focus on
> Stale issues (no updates in > 90 days), since those are probably the
> easiest to act on.
>
> For example:
> Stale Issues
> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
>
>    - [Oct 22, 2012] SPARK-560
>    <https://issues.apache.org/jira/browse/SPARK-560>: Specialize RDDs /
>    iterators
>    - [Oct 22, 2012] SPARK-540
>    <https://issues.apache.org/jira/browse/SPARK-540>: Add API to
>    customize in-memory representation of RDDs
>    - [Oct 22, 2012] SPARK-573
>    <https://issues.apache.org/jira/browse/SPARK-573>: Clarify semantics
>    of the parallelized closures
>    - [Nov 06, 2012] SPARK-609
>    <https://issues.apache.org/jira/browse/SPARK-609>: Add instructions
>    for enabling Akka debug logging
>    - [Dec 17, 2012] SPARK-636
>    <https://issues.apache.org/jira/browse/SPARK-636>: Add mechanism to
>    run system management/configuration tasks on all workers
>
> Andrew,
>
> Does that seem more useful?
>
> Nick
> 
>
> On Sun Dec 14 2014 at 3:20:54 AM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I formatted this report using Markdown; I'm open to changing the
>> structure or formatting or reducing the amount of information to make the
>> report more easily consumable.
>>
>> Regarding just sending links or whether this would just be mailing list
>> noise, those are a good questions.
>>
>> I've sent out links before, but I feel from a UX perspective having the
>> information right in the email itself makes it frictionless for people to
>> act on the information. For me, that difference is enough to hook me into
>> spending a few minutes on JIRA vs. just glossing over an email with a link.
>>
>> I wonder if that's also the case for others on this list.
>>
>> If you already spend a good amount of time cleaning up on JIRA, then this
>> report won't be that relevant to you. But given the number and growth of
>> open issues on our tracker, I suspect we could do with quite a few more
>> people chipping in and cleaning up where they can.
>>
>> That's the real problem that this report is intended to help with.
>>
>> Nick
>>
>>
>>
>> On Sun Dec 14 2014 at 2:49:00 AM Andrew Ash <an...@andrewash.com> wrote:
>>
>>> The goal of increasing visibility on open issues is a good one.  How is
>>> this different from just a link to Jira though?  Some might say this adds
>>> noise to the mailing list and doesn't contain any information not already
>>> available in Jira.
>>>
>>> The idea seems good but the formatting leaves a little to be desired.
>>> If you aren't opposed to using HTML, I might suggest this more compact
>>> format:
>>>
>>> SPARK-2044 <https://issues.apache.org/jira/browse/SPARK-2044> Pluggable interface
>>> for shuffles
>>> SPARK-2365 <https://issues.apache.org/jira/browse/SPARK-2365> Add
>>> IndexedRDD, an efficient updatable key-value
>>> SPARK-3561 <https://issues.apache.org/jira/browse/SPARK-3561> Allow for pluggable
>>> execution contexts in Spark
>>>
>>> Andrew
>>>
>>> On Sat, Dec 13, 2014 at 11:31 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> What do y’all think of a report like this emailed out to the dev list
>>>> on a
>>>> monthly basis?
>>>>
>>>> The goal would be to increase visibility into our open issues and
>>>> encourage
>>>> developers to tend to our issue tracker more frequently.
>>>>
>>>> Nick
>>>>
>>>> There are 1,236 unresolved issues
>>>>
>>> <https://issues.apache.org/jira/issues/?jql=project+%3D+SPAR
>>>> K+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC>
>>>
>>>
>>>> in the Spark project on JIRA.
>>>> Recently Updated Issues
>>>>
>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>>> 20updated%20DESC>
>>>
>>>
>>>> Type Key Priority Summary Last Updated   Bug SPARK-4841
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-4841> Major Batch
>>>> serializer
>>>
>>>
>>>> bug in PySpark’s RDD.zip Dec 14, 2014  Question SPARK-4810
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run
>>>
>>>
>>>> collect Dec 14, 2014  Bug SPARK-785
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner
>>>> not
>>>
>>>
>>>> invoked on most PairRDDFunctions Dec 14, 2014  New Feature SPARK-3405
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster
>>>
>>>
>>>> creation on VPC Dec 13, 2014  Improvement SPARK-1555
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-1555> Minor enable
>>>
>>>
>>>> ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13, 2014
>>>>  Stale
>>>> Issues
>>>>
>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20update
>>>> d%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
>>>
>>>
>>>> Type Key Priority Summary Last Updated   Bug SPARK-560
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs /
>>>
>>>
>>>> iterators Oct 22, 2012  New Feature SPARK-540
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-540> None Add API to
>>>> customize
>>>
>>>
>>>> in-memory representation of RDDs Oct 22, 2012  Improvement SPARK-573
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-573> None Clarify
>>>> semantics of
>>>
>>>
>>>> the parallelized closures Oct 22, 2012  New Feature SPARK-609
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-609> Minor Add instructions
>>>
>>>
>>>> for enabling Akka debug logging Nov 06, 2012  New Feature SPARK-636
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism to
>>>
>>>
>>>> run system management/configuration tasks on all workers Dec 17, 2012
>>>>  Most
>>>> Watched Issues
>>>>
>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>>> 20watchers%20DESC>
>>>
>>>
>>>> Type Key Priority Summary Watchers   New Feature SPARK-3561
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for
>>>
>>>
>>>> pluggable execution contexts in Spark 75  New Feature SPARK-2365
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add
>>>> IndexedRDD, an
>>>
>>>
>>>> efficient updatable key-value store 33  Improvement SPARK-2044
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable
>>>
>>>
>>>> interface for shuffles 30  New Feature SPARK-1405
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel
>>>> Latent
>>>
>>>
>>>> Dirichlet Allocation (LDA) atop of spark in MLlib 26  New Feature
>>>> SPARK-1406
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model
>>>
>>>
>>>> evaluation support via MLib 21   Most Voted Issues
>>>>
>>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>>> 20votes%20DESC>
>>>
>>>
>>>> Type Key Priority Summary Votes   Bug SPARK-2541
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone mode
>>>
>>>
>>>> can’t access secure HDFS anymore 12  New Feature SPARK-2365
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add
>>>> IndexedRDD, an
>>>
>>>
>>>> efficient updatable key-value store 9  Improvement SPARK-3533
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-3533> Major Add
>>>
>>>
>>>> saveAsTextFileByKey() method to RDDs 8  Bug SPARK-2883
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark Support
>>>
>>>
>>>> for ORCFile format 6  New Feature SPARK-1442
>>>>
>>> <https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window
>>>> function support 6
>>>> 
>>>>
>>>
>>>

Re: Spark JIRA Report

Posted by Nicholas Chammas <ni...@gmail.com>.

Taking after Andrew’s suggestion, perhaps the report can just focus on
Stale issues (no updates in > 90 days), since those are probably the
easiest to act on.

For example:
Stale Issues
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>

   - [Oct 22, 2012] SPARK-560
   <https://issues.apache.org/jira/browse/SPARK-560>: Specialize RDDs /
   iterators
   - [Oct 22, 2012] SPARK-540
   <https://issues.apache.org/jira/browse/SPARK-540>: Add API to customize
   in-memory representation of RDDs
   - [Oct 22, 2012] SPARK-573
   <https://issues.apache.org/jira/browse/SPARK-573>: Clarify semantics of
   the parallelized closures
   - [Nov 06, 2012] SPARK-609
   <https://issues.apache.org/jira/browse/SPARK-609>: Add instructions for
   enabling Akka debug logging
   - [Dec 17, 2012] SPARK-636
   <https://issues.apache.org/jira/browse/SPARK-636>: Add mechanism to run
   system management/configuration tasks on all workers

Andrew,

Does that seem more useful?

Nick


On Sun Dec 14 2014 at 3:20:54 AM Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> I formatted this report using Markdown; I'm open to changing the structure
> or formatting or reducing the amount of information to make the report more
> easily consumable.
>
> Regarding just sending links or whether this would just be mailing list
> noise, those are a good questions.
>
> I've sent out links before, but I feel from a UX perspective having the
> information right in the email itself makes it frictionless for people to
> act on the information. For me, that difference is enough to hook me into
> spending a few minutes on JIRA vs. just glossing over an email with a link.
>
> I wonder if that's also the case for others on this list.
>
> If you already spend a good amount of time cleaning up on JIRA, then this
> report won't be that relevant to you. But given the number and growth of
> open issues on our tracker, I suspect we could do with quite a few more
> people chipping in and cleaning up where they can.
>
> That's the real problem that this report is intended to help with.
>
> Nick
>
>
>
> On Sun Dec 14 2014 at 2:49:00 AM Andrew Ash <an...@andrewash.com> wrote:
>
>> The goal of increasing visibility on open issues is a good one.  How is
>> this different from just a link to Jira though?  Some might say this adds
>> noise to the mailing list and doesn't contain any information not already
>> available in Jira.
>>
>> The idea seems good but the formatting leaves a little to be desired.  If
>> you aren't opposed to using HTML, I might suggest this more compact format:
>>
>> SPARK-2044 <https://issues.apache.org/jira/browse/SPARK-2044> Pluggable interface
>> for shuffles
>> SPARK-2365 <https://issues.apache.org/jira/browse/SPARK-2365> Add
>> IndexedRDD, an efficient updatable key-value
>> SPARK-3561 <https://issues.apache.org/jira/browse/SPARK-3561> Allow for pluggable
>> execution contexts in Spark
>>
>> Andrew
>>
>> On Sat, Dec 13, 2014 at 11:31 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> What do y’all think of a report like this emailed out to the dev list on
>>> a
>>> monthly basis?
>>>
>>> The goal would be to increase visibility into our open issues and
>>> encourage
>>> developers to tend to our issue tracker more frequently.
>>>
>>> Nick
>>>
>>> There are 1,236 unresolved issues
>>>
>> <https://issues.apache.org/jira/issues/?jql=project+%3D+SPAR
>>> K+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC>
>>
>>
>>> in the Spark project on JIRA.
>>> Recently Updated Issues
>>>
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>> 20updated%20DESC>
>>
>>
>>> Type Key Priority Summary Last Updated   Bug SPARK-4841
>>>
>> <https://issues.apache.org/jira/browse/SPARK-4841> Major Batch serializer
>>
>>
>>> bug in PySpark’s RDD.zip Dec 14, 2014  Question SPARK-4810
>>>
>> <https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run
>>
>>
>>> collect Dec 14, 2014  Bug SPARK-785
>>>
>> <https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner
>>> not
>>
>>
>>> invoked on most PairRDDFunctions Dec 14, 2014  New Feature SPARK-3405
>>>
>> <https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster
>>
>>
>>> creation on VPC Dec 13, 2014  Improvement SPARK-1555
>>>
>> <https://issues.apache.org/jira/browse/SPARK-1555> Minor enable
>>
>>
>>> ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13, 2014
>>>  Stale
>>> Issues
>>>
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20update
>>> d%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
>>
>>
>>> Type Key Priority Summary Last Updated   Bug SPARK-560
>>>
>> <https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs /
>>
>>
>>> iterators Oct 22, 2012  New Feature SPARK-540
>>>
>> <https://issues.apache.org/jira/browse/SPARK-540> None Add API to
>>> customize
>>
>>
>>> in-memory representation of RDDs Oct 22, 2012  Improvement SPARK-573
>>>
>> <https://issues.apache.org/jira/browse/SPARK-573> None Clarify semantics
>>> of
>>
>>
>>> the parallelized closures Oct 22, 2012  New Feature SPARK-609
>>>
>> <https://issues.apache.org/jira/browse/SPARK-609> Minor Add instructions
>>
>>
>>> for enabling Akka debug logging Nov 06, 2012  New Feature SPARK-636
>>>
>> <https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism to
>>
>>
>>> run system management/configuration tasks on all workers Dec 17, 2012
>>>  Most
>>> Watched Issues
>>>
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>> 20watchers%20DESC>
>>
>>
>>> Type Key Priority Summary Watchers   New Feature SPARK-3561
>>>
>> <https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for
>>
>>
>>> pluggable execution contexts in Spark 75  New Feature SPARK-2365
>>>
>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD,
>>> an
>>
>>
>>> efficient updatable key-value store 33  Improvement SPARK-2044
>>>
>> <https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable
>>
>>
>>> interface for shuffles 30  New Feature SPARK-1405
>>>
>> <https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel
>>> Latent
>>
>>
>>> Dirichlet Allocation (LDA) atop of spark in MLlib 26  New Feature
>>> SPARK-1406
>>>
>> <https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model
>>
>>
>>> evaluation support via MLib 21   Most Voted Issues
>>>
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>> 20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%
>>> 20votes%20DESC>
>>
>>
>>> Type Key Priority Summary Votes   Bug SPARK-2541
>>>
>> <https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone mode
>>
>>
>>> can’t access secure HDFS anymore 12  New Feature SPARK-2365
>>>
>> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD,
>>> an
>>
>>
>>> efficient updatable key-value store 9  Improvement SPARK-3533
>>>
>> <https://issues.apache.org/jira/browse/SPARK-3533> Major Add
>>
>>
>>> saveAsTextFileByKey() method to RDDs 8  Bug SPARK-2883
>>>
>> <https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark Support
>>
>>
>>> for ORCFile format 6  New Feature SPARK-1442
>>>
>> <https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window
>>> function support 6
>>> 
>>>
>>
>>

Re: Spark JIRA Report

Posted by Nicholas Chammas <ni...@gmail.com>.

I formatted this report using Markdown; I'm open to changing the structure
or formatting or reducing the amount of information to make the report more
easily consumable.

Regarding just sending links or whether this would just be mailing list
noise, those are a good questions.

I've sent out links before, but I feel from a UX perspective having the
information right in the email itself makes it frictionless for people to
act on the information. For me, that difference is enough to hook me into
spending a few minutes on JIRA vs. just glossing over an email with a link.

I wonder if that's also the case for others on this list.

If you already spend a good amount of time cleaning up on JIRA, then this
report won't be that relevant to you. But given the number and growth of
open issues on our tracker, I suspect we could do with quite a few more
people chipping in and cleaning up where they can.

That's the real problem that this report is intended to help with.

Nick


On Sun Dec 14 2014 at 2:49:00 AM Andrew Ash <an...@andrewash.com> wrote:

> The goal of increasing visibility on open issues is a good one.  How is
> this different from just a link to Jira though?  Some might say this adds
> noise to the mailing list and doesn't contain any information not already
> available in Jira.
>
> The idea seems good but the formatting leaves a little to be desired.  If
> you aren't opposed to using HTML, I might suggest this more compact format:
>
> SPARK-2044 <https://issues.apache.org/jira/browse/SPARK-2044> Pluggable interface
> for shuffles
> SPARK-2365 <https://issues.apache.org/jira/browse/SPARK-2365> Add
> IndexedRDD, an efficient updatable key-value
> SPARK-3561 <https://issues.apache.org/jira/browse/SPARK-3561> Allow for pluggable
> execution contexts in Spark
>
> Andrew
>
> On Sat, Dec 13, 2014 at 11:31 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> What do y’all think of a report like this emailed out to the dev list on a
>> monthly basis?
>>
>> The goal would be to increase visibility into our open issues and
>> encourage
>> developers to tend to our issue tracker more frequently.
>>
>> Nick
>>
>> There are 1,236 unresolved issues
>>
> <https://issues.apache.org/jira/issues/?jql=project+%3D+
>> SPARK+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC>
>
>
>> in the Spark project on JIRA.
>> Recently Updated Issues
>>
> <https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%
>> 20BY%20updated%20DESC>
>
>
>> Type Key Priority Summary Last Updated   Bug SPARK-4841
>>
> <https://issues.apache.org/jira/browse/SPARK-4841> Major Batch serializer
>
>
>> bug in PySpark’s RDD.zip Dec 14, 2014  Question SPARK-4810
>>
> <https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run
>
>
>> collect Dec 14, 2014  Bug SPARK-785
>>
> <https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner not
>
>
>> invoked on most PairRDDFunctions Dec 14, 2014  New Feature SPARK-3405
>>
> <https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster
>
>
>> creation on VPC Dec 13, 2014  Improvement SPARK-1555
>>
> <https://issues.apache.org/jira/browse/SPARK-1555> Minor enable
>
>
>> ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13, 2014
>>  Stale
>> Issues
>>
> <https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%
>> 20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC>
>
>
>> Type Key Priority Summary Last Updated   Bug SPARK-560
>>
> <https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs /
>
>
>> iterators Oct 22, 2012  New Feature SPARK-540
>>
> <https://issues.apache.org/jira/browse/SPARK-540> None Add API to
>> customize
>
>
>> in-memory representation of RDDs Oct 22, 2012  Improvement SPARK-573
>>
> <https://issues.apache.org/jira/browse/SPARK-573> None Clarify semantics
>> of
>
>
>> the parallelized closures Oct 22, 2012  New Feature SPARK-609
>>
> <https://issues.apache.org/jira/browse/SPARK-609> Minor Add instructions
>
>
>> for enabling Akka debug logging Nov 06, 2012  New Feature SPARK-636
>>
> <https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism to
>
>
>> run system management/configuration tasks on all workers Dec 17, 2012
>>  Most
>> Watched Issues
>>
> <https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%
>> 20BY%20watchers%20DESC>
>
>
>> Type Key Priority Summary Watchers   New Feature SPARK-3561
>>
> <https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for
>
>
>> pluggable execution contexts in Spark 75  New Feature SPARK-2365
>>
> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD,
>> an
>
>
>> efficient updatable key-value store 33  Improvement SPARK-2044
>>
> <https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable
>
>
>> interface for shuffles 30  New Feature SPARK-1405
>>
> <https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel
>> Latent
>
>
>> Dirichlet Allocation (LDA) atop of spark in MLlib 26  New Feature
>> SPARK-1406
>>
> <https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model
>
>
>> evaluation support via MLib 21   Most Voted Issues
>>
> <https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%
>> 20BY%20votes%20DESC>
>
>
>> Type Key Priority Summary Votes   Bug SPARK-2541
>>
> <https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone mode
>
>
>> can’t access secure HDFS anymore 12  New Feature SPARK-2365
>>
> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD,
>> an
>
>
>> efficient updatable key-value store 9  Improvement SPARK-3533
>>
> <https://issues.apache.org/jira/browse/SPARK-3533> Major Add
>
>
>> saveAsTextFileByKey() method to RDDs 8  Bug SPARK-2883
>>
> <https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark Support
>
>
>> for ORCFile format 6  New Feature SPARK-1442
>>
> <https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window
>> function support 6
>> 
>>
>
>

Re: Spark JIRA Report

Posted by Andrew Ash <an...@andrewash.com>.

The goal of increasing visibility on open issues is a good one.  How is
this different from just a link to Jira though?  Some might say this adds
noise to the mailing list and doesn't contain any information not already
available in Jira.

The idea seems good but the formatting leaves a little to be desired.  If
you aren't opposed to using HTML, I might suggest this more compact format:

SPARK-2044 <https://issues.apache.org/jira/browse/SPARK-2044>
Pluggable interface
for shuffles
SPARK-2365 <https://issues.apache.org/jira/browse/SPARK-2365> Add
IndexedRDD, an efficient updatable key-value
SPARK-3561 <https://issues.apache.org/jira/browse/SPARK-3561> Allow
for pluggable
execution contexts in Spark

Andrew

On Sat, Dec 13, 2014 at 11:31 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> What do y’all think of a report like this emailed out to the dev list on a
> monthly basis?
>
> The goal would be to increase visibility into our open issues and encourage
> developers to tend to our issue tracker more frequently.
>
> Nick
>
> There are 1,236 unresolved issues
> <
> https://issues.apache.org/jira/issues/?jql=project+%3D+SPARK+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC
> >
> in the Spark project on JIRA.
> Recently Updated Issues
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC
> >
> Type Key Priority Summary Last Updated   Bug SPARK-4841
> <https://issues.apache.org/jira/browse/SPARK-4841> Major Batch serializer
> bug in PySpark’s RDD.zip Dec 14, 2014  Question SPARK-4810
> <https://issues.apache.org/jira/browse/SPARK-4810> Major Failed to run
> collect Dec 14, 2014  Bug SPARK-785
> <https://issues.apache.org/jira/browse/SPARK-785> Major ClosureCleaner not
> invoked on most PairRDDFunctions Dec 14, 2014  New Feature SPARK-3405
> <https://issues.apache.org/jira/browse/SPARK-3405> Minor EC2 cluster
> creation on VPC Dec 13, 2014  Improvement SPARK-1555
> <https://issues.apache.org/jira/browse/SPARK-1555> Minor enable
> ec2/spark_ec2.py to stop/delete cluster non-interactively Dec 13, 2014
>  Stale
> Issues
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20updated%20%3C%3D%20-90d%20ORDER%20BY%20updated%20ASC
> >
> Type Key Priority Summary Last Updated   Bug SPARK-560
> <https://issues.apache.org/jira/browse/SPARK-560> None Specialize RDDs /
> iterators Oct 22, 2012  New Feature SPARK-540
> <https://issues.apache.org/jira/browse/SPARK-540> None Add API to
> customize
> in-memory representation of RDDs Oct 22, 2012  Improvement SPARK-573
> <https://issues.apache.org/jira/browse/SPARK-573> None Clarify semantics
> of
> the parallelized closures Oct 22, 2012  New Feature SPARK-609
> <https://issues.apache.org/jira/browse/SPARK-609> Minor Add instructions
> for enabling Akka debug logging Nov 06, 2012  New Feature SPARK-636
> <https://issues.apache.org/jira/browse/SPARK-636> Major Add mechanism to
> run system management/configuration tasks on all workers Dec 17, 2012
>  Most
> Watched Issues
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20watchers%20DESC
> >
> Type Key Priority Summary Watchers   New Feature SPARK-3561
> <https://issues.apache.org/jira/browse/SPARK-3561> Major Allow for
> pluggable execution contexts in Spark 75  New Feature SPARK-2365
> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD,
> an
> efficient updatable key-value store 33  Improvement SPARK-2044
> <https://issues.apache.org/jira/browse/SPARK-2044> Major Pluggable
> interface for shuffles 30  New Feature SPARK-1405
> <https://issues.apache.org/jira/browse/SPARK-1405> Critical parallel
> Latent
> Dirichlet Allocation (LDA) atop of spark in MLlib 26  New Feature
> SPARK-1406
> <https://issues.apache.org/jira/browse/SPARK-1406> Major PMML model
> evaluation support via MLib 21   Most Voted Issues
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20votes%20DESC
> >
> Type Key Priority Summary Votes   Bug SPARK-2541
> <https://issues.apache.org/jira/browse/SPARK-2541> Major Standalone mode
> can’t access secure HDFS anymore 12  New Feature SPARK-2365
> <https://issues.apache.org/jira/browse/SPARK-2365> Major Add IndexedRDD,
> an
> efficient updatable key-value store 9  Improvement SPARK-3533
> <https://issues.apache.org/jira/browse/SPARK-3533> Major Add
> saveAsTextFileByKey() method to RDDs 8  Bug SPARK-2883
> <https://issues.apache.org/jira/browse/SPARK-2883> Blocker Spark Support
> for ORCFile format 6  New Feature SPARK-1442
> <https://issues.apache.org/jira/browse/SPARK-1442> Major Add Window
> function support 6
> 
>