You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "assaf.mendelson" <as...@rsa.com> on 2016/11/02 11:32:43 UTC

Handling questions in the mailing lists

Hi,
I know this is a little off topic but I wanted to raise an issue about handling questions in the mailing list (this is true both for the user mailing list and the dev but since there are other options such as stack overflow for user questions, this is more problematic in dev).
Let's say I ask a question (as I recently did). Unfortunately this was during spark summit in Europe so probably people were busy. In any case no one answered.
The problem is, that if no one answers very soon, the question will almost certainly remain unanswered because new messages will simply drown it.

This is a common issue not just for questions but for any comment or idea which is not immediately picked up.

I believe we should have a method of handling this.
Generally, I would say these types of things belong in stack overflow, after all, the way it is built is perfect for this. More seasoned spark contributors and committers can periodically check out unanswered questions and answer them.
The problem is that stack overflow (as well as other targets such as the databricks forums) tend to have a more user based orientation. This means that any spark internal question will almost certainly remain unanswered.

I was wondering if we could come up with a solution for this.

Assaf.





--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by eliasah <ab...@gmail.com>.
Besides the traffic eventual issue, I don't believe that it would benefit
users to get a standalone site. Some great answers are provided by users
that aren't spark experts but maybe java, python, aws or even some system
experts why do we want to play alone ? 

We are trying nevertheless the animate the apache spark chat room which
isn't as obvious as one might want it to be. 

I'd rather things stay the way they are on SO. There is a bunch of us that
actually are very active and answer as much as we can and we'll be glad to
help.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20012.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


RE: Handling questions in the mailing lists

Posted by Io...@nomura.com.
…my 0.00001 cent ☺
As a Spark and SO user, I would not find a separate SE a good thing.

*Part of the SO beauty is that you can filter easily and track different topics from one dashboard.
*Being part of SO also gets good exposure as it raises awareness of Spark across a wider audience.
*High reputation users, even if they are say “python centric”, add value by moderating/commenting.
*I don’t think Spark-specific is a good thing either. Spark is typically combined with a huge range of other technologies (Avro, Parquet, Hadoop, Python, R, Scala, Akka, Java, HBase to name a few). Users that are specialists in these topics can provide value and help build quality in Spark tag. By getting a new SE you kind of exclude them.
*It will take time to build enough reputable users to share the moderation burden
*A high-rep Java user is likely to ask a good question. Forcing people to join an SE with rep being reset you will lose the ability to track your user (and may I say potential Evangelists) quality. By observation(no idea if true), questions by high-rep users attract much better attention than any user with 100 or less.
*Last but not least, high-rep users usually know, follow and impose SO rules and best practices quite well where a Spark centric SE might not be as rule-focused. Even though rules can sometimes be annoying, overall they build quality questions so more users get involved.


From: Sean Owen [mailto:sowen@cloudera.com]
Sent: 24 November 2016 10:53
To: assaf.mendelson; dev@spark.apache.org
Subject: Re: Handling questions in the mailing lists

Here's a view into the requirements, for example: http://area51.stackexchange.com/proposals/76571/emacs<https://urldefense.proofpoint.com/v2/url?u=http-3A__area51.stackexchange.com_proposals_76571_emacs&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=JbblqRhi6skf8IQckq_B0uUmi-vtEU4-eByD_-XzH_0&e=>

You're right there is a lot of activity on SO, easily 30-40 questions per day. One thing I noticed about, for example, the Data Science SE is that most questions relevant to it were still posted on SO or Cross Validated. It struggles as an SE even though there is, out there, more than enough activity that _should_ be on the specific SE.

There are more niche things that end up working as an SE, so I'm not dead set against it, though it would remain unofficial and my gut is that it might just split the conversation yet further. I'd leave it, however, to anyone active on SO already to decide that it's worth a dedicated SE and just do it.

On Thu, Nov 24, 2016 at 10:45 AM assaf.mendelson <as...@rsa.com>> wrote:
I am not sure what is enough traffic. Some of the SE groups already existing do not have that much traffic.
Specifically the  user mailing list has ~50 emails per day. It wouldn’t be much of a stretch to extract 1-2 questions per day from that.  In the regular stackoverflow the apache-spark had more than 50 new questions in the last 24 hours alone (http://stackoverflow.com/questions/tagged/apache-spark?sort=newest&pageSize=50<https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_tagged_apache-2Dspark-3Fsort-3Dnewest-26pageSize-3D50&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=PIWKkzz2E50ALvSppI-egkjBJr0ZJO7MFrLw48XUIqk&e=>).

I believe this should be enough traffic (and the traffic would rise once quality answers begin to appear).


From: Sean Owen [via Apache Spark Developers List] [mailto:ml-node+<mailto:ml-node%2B>[hidden email]<https://urldefense.proofpoint.com/v2/url?u=http-3A___user_SendEmail.jtp-3Ftype-3Dnode-26node-3D20008-26i-3D0&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=CmvGVA6SmAfyMrgYe09vDeLguHlYysDT9MQjmpxqZsg&e=>]
Sent: Thursday, November 24, 2016 12:32 PM

To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place.
On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <[hidden email]<https://urldefense.proofpoint.com/v2/url?u=http-3A___user_SendEmail.jtp-3Ftype-3Dnode-26node-3D20007-26i-3D0&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=t_Eyig5OkwFVjh1bJTau690DaZUMy3chrAYd8qfOcJ4&e=>> wrote:
Sorry to reawaken this, but I just noticed it is possible to propose new topic specific sites (http://area51.stackexchange.com/faq<https://urldefense.proofpoint.com/v2/url?u=http-3A__area51.stackexchange.com_faq&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=77_HseiyPB7dPbk_5dTD1pSUdETnfCXO-lFSj260eBo&e=>)  for stack overflow. So for example we might have a spark.stackexchange.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.stackexchange.com&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=_SSyTzp_nNyqJ-4JbSyuX79ARA-ZaR7GITWFnCvsm6M&e=> spark specific site.
The advantage of such a site are many. First of all it is spark specific. Secondly the reputation of people would be on spark and not on general questions and lastly (and most importantly in my opinion) it would have spark based moderators (which are all spark moderator as opposed to general technology).

The process of creating such a site is not complicated. Basically someone creates a proposal (I have no problem doing so). Then creating 5 example questions (something we want on the site) and get 5 people need to ‘follow’ it within 3 days. This creates a “definition” phase. The goal is to get at least 40 questions that embody the goal of the site and have at least 10 net votes and enough people follow it. When enough traction has been made (enough questions and enough followers) then the site moves to commitment phase. In this phase users “commit” to being on the site (basically this is aimed to see the community of experts is big enough). Once all this happens the site moves into beta. This means the site becomes active and it will become a full site if it sees enough traction.

I would suggest trying to set this up.

Thanks,
                Assaf

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20007.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Handling-2Dquestions-2Din-2Dthe-2Dmailing-2Dlists-2Dtp19690p20007.html&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=eegtwNTGbMVSJeNJ3YRsT6kfojNzsY3-yy7vbdLXfhY&e=>
To start a new topic under Apache Spark Developers List, email [hidden email]<https://urldefense.proofpoint.com/v2/url?u=http-3A___user_SendEmail.jtp-3Ftype-3Dnode-26node-3D20008-26i-3D1&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=ZqXu1PBOL6cUGNIGWfZAGV6UQadiGFNAypvPh8j85fU&e=>
To unsubscribe from Apache Spark Developers List, click here.
NAML<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_template_NamlServlet.jtp-3Fmacro-3Dmacro-5Fviewer-26id-3Dinstant-5Fhtml-2521nabble-253Aemail.naml-26base-3Dnabble.naml.namespaces.BasicNamespace-2Dnabble.view.web.template.NabbleNamespace-2Dnabble.naml.namespaces.BasicNamespace-2Dnabble.view.web.template.NabbleNamespace-2Dnabble.naml.namespaces.BasicNamespace-2Dnabble.view.web.template.NabbleNamespace-2Dnabble.view.web.template.NodeNamespace-26breadcrumbs-3Dnotify-5Fsubscribers-2521nabble-253Aemail.naml-2Dinstant-5Femails-2521nabble-253Aemail.naml-2Dsend-5Finstant-5Femail-2521nabble-253Aemail.naml&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=D62ut_PNCSYmyBNZaPNQfBfoLHshgf48g51EQGMImgo&e=>

________________________________
View this message in context: RE: Handling questions in the mailing lists<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Handling-2Dquestions-2Din-2Dthe-2Dmailing-2Dlists-2Dtp19690p20008.html&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=Wviow8ZKdlo1L45W-fNJw0YuXtIY46HX1FJ883Vsvr0&e=>
Sent from the Apache Spark Developers List mailing list archive<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_&d=DgMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA&s=nC-nitRX9xPQIidJbx-rnkS7XyVcQ3JIPi85RvcqTec&e=> at Nabble.com.


This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm


Re: Handling questions in the mailing lists

Posted by Sean Owen <so...@cloudera.com>.
Here's a view into the requirements, for example:
http://area51.stackexchange.com/proposals/76571/emacs

You're right there is a lot of activity on SO, easily 30-40 questions per
day. One thing I noticed about, for example, the Data Science SE is that
most questions relevant to it were still posted on SO or Cross Validated.
It struggles as an SE even though there is, out there, more than enough
activity that _should_ be on the specific SE.

There are more niche things that end up working as an SE, so I'm not dead
set against it, though it would remain unofficial and my gut is that it
might just split the conversation yet further. I'd leave it, however, to
anyone active on SO already to decide that it's worth a dedicated SE and
just do it.

On Thu, Nov 24, 2016 at 10:45 AM assaf.mendelson <as...@rsa.com>
wrote:

> I am not sure what is enough traffic. Some of the SE groups already
> existing do not have that much traffic.
>
> Specifically the  user mailing list has ~50 emails per day. It wouldn’t be
> much of a stretch to extract 1-2 questions per day from that.  In the
> regular stackoverflow the apache-spark had more than 50 new questions in
> the last 24 hours alone (
> http://stackoverflow.com/questions/tagged/apache-spark?sort=newest&pageSize=50).
>
>
>
>
> I believe this should be enough traffic (and the traffic would rise once
> quality answers begin to appear).
>
>
>
>
>
> *From:* Sean Owen [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=20008&i=0>]
> *Sent:* Thursday, November 24, 2016 12:32 PM
>
>
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> I don't think there's nearly enough traffic to sustain a stand-alone SE. I
> helped mod the Data Science SE and it's still not technically critical mass
> after 2 years. It would just fracture the discussion to yet another place.
>
> On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20007&i=0>> wrote:
>
> Sorry to reawaken this, but I just noticed it is possible to propose new
> topic specific sites (http://area51.stackexchange.com/faq)  for stack
> overflow. So for example we might have a spark.stackexchange.com spark
> specific site.
>
> The advantage of such a site are many. First of all it is spark specific.
> Secondly the reputation of people would be on spark and not on general
> questions and lastly (and most importantly in my opinion) it would have
> spark based moderators (which are all spark moderator as opposed to general
> technology).
>
>
>
> The process of creating such a site is not complicated. Basically someone
> creates a proposal (I have no problem doing so). Then creating 5 example
> questions (something we want on the site) and get 5 people need to ‘follow’
> it within 3 days. This creates a “definition” phase. The goal is to get at
> least 40 questions that embody the goal of the site and have at least 10
> net votes and enough people follow it. When enough traction has been made
> (enough questions and enough followers) then the site moves to commitment
> phase. In this phase users “commit” to being on the site (basically this is
> aimed to see the community of experts is big enough). Once all this happens
> the site moves into beta. This means the site becomes active and it will
> become a full site if it sees enough traction.
>
>
>
> I would suggest trying to set this up.
>
>
>
> Thanks,
>
>                 Assaf
>
>
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20007.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=20008&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> ------------------------------
> View this message in context: RE: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20008.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>

RE: Handling questions in the mailing lists

Posted by "assaf.mendelson" <as...@rsa.com>.
I am not sure what is enough traffic. Some of the SE groups already existing do not have that much traffic.
Specifically the  user mailing list has ~50 emails per day. It wouldn’t be much of a stretch to extract 1-2 questions per day from that.  In the regular stackoverflow the apache-spark had more than 50 new questions in the last 24 hours alone (http://stackoverflow.com/questions/tagged/apache-spark?sort=newest&pageSize=50).

I believe this should be enough traffic (and the traffic would rise once quality answers begin to appear).


From: Sean Owen [via Apache Spark Developers List] [mailto:ml-node+s1001551n20007h6@n3.nabble.com]
Sent: Thursday, November 24, 2016 12:32 PM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place.
On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=20007&i=0>> wrote:
Sorry to reawaken this, but I just noticed it is possible to propose new topic specific sites (http://area51.stackexchange.com/faq)  for stack overflow. So for example we might have a spark.stackexchange.com<http://spark.stackexchange.com> spark specific site.
The advantage of such a site are many. First of all it is spark specific. Secondly the reputation of people would be on spark and not on general questions and lastly (and most importantly in my opinion) it would have spark based moderators (which are all spark moderator as opposed to general technology).

The process of creating such a site is not complicated. Basically someone creates a proposal (I have no problem doing so). Then creating 5 example questions (something we want on the site) and get 5 people need to ‘follow’ it within 3 days. This creates a “definition” phase. The goal is to get at least 40 questions that embody the goal of the site and have at least 10 net votes and enough people follow it. When enough traction has been made (enough questions and enough followers) then the site moves to commitment phase. In this phase users “commit” to being on the site (basically this is aimed to see the community of experts is big enough). Once all this happens the site moves into beta. This means the site becomes active and it will become a full site if it sees enough traction.

I would suggest trying to set this up.

Thanks,
                Assaf


________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20007.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20008.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Sean Owen <so...@cloudera.com>.
I don't think there's nearly enough traffic to sustain a stand-alone SE. I
helped mod the Data Science SE and it's still not technically critical mass
after 2 years. It would just fracture the discussion to yet another place.

On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <as...@rsa.com>
wrote:

> Sorry to reawaken this, but I just noticed it is possible to propose new
> topic specific sites (http://area51.stackexchange.com/faq)  for stack
> overflow. So for example we might have a spark.stackexchange.com spark
> specific site.
>
> The advantage of such a site are many. First of all it is spark specific.
> Secondly the reputation of people would be on spark and not on general
> questions and lastly (and most importantly in my opinion) it would have
> spark based moderators (which are all spark moderator as opposed to general
> technology).
>
>
>
> The process of creating such a site is not complicated. Basically someone
> creates a proposal (I have no problem doing so). Then creating 5 example
> questions (something we want on the site) and get 5 people need to ‘follow’
> it within 3 days. This creates a “definition” phase. The goal is to get at
> least 40 questions that embody the goal of the site and have at least 10
> net votes and enough people follow it. When enough traction has been made
> (enough questions and enough followers) then the site moves to commitment
> phase. In this phase users “commit” to being on the site (basically this is
> aimed to see the community of experts is big enough). Once all this happens
> the site moves into beta. This means the site becomes active and it will
> become a full site if it sees enough traction.
>
>
>
> I would suggest trying to set this up.
>
>
>
> Thanks,
>
>                 Assaf
>
>
>

RE: Handling questions in the mailing lists

Posted by "assaf.mendelson" <as...@rsa.com>.
Sorry to reawaken this, but I just noticed it is possible to propose new topic specific sites (http://area51.stackexchange.com/faq)  for stack overflow. So for example we might have a spark.stackexchange.com spark specific site.
The advantage of such a site are many. First of all it is spark specific. Secondly the reputation of people would be on spark and not on general questions and lastly (and most importantly in my opinion) it would have spark based moderators (which are all spark moderator as opposed to general technology).

The process of creating such a site is not complicated. Basically someone creates a proposal (I have no problem doing so). Then creating 5 example questions (something we want on the site) and get 5 people need to 'follow' it within 3 days. This creates a "definition" phase. The goal is to get at least 40 questions that embody the goal of the site and have at least 10 net votes and enough people follow it. When enough traction has been made (enough questions and enough followers) then the site moves to commitment phase. In this phase users "commit" to being on the site (basically this is aimed to see the community of experts is big enough). Once all this happens the site moves into beta. This means the site becomes active and it will become a full site if it sees enough traction.

I would suggest trying to set this up.

Thanks,
                Assaf


From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19916h85@n3.nabble.com]
Sent: Wednesday, November 16, 2016 4:33 PM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

Awesome stuff! Thanks Sean! :-)
On Wed, Nov 16, 2016 at 05:57 Sean Owen <[hidden email]</user/SendEmail.jtp?type=node&node=19916&i=0>> wrote:
I updated the wiki to point to the /community.html page. (We're going to migrate the wiki real soon now anyway)

I updated the /community.html page per this thread too. PR: https://github.com/apache/spark-website/pull/16


On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19916&i=1>> wrote:

Should probably also update the helping others section in the how to contribute section (<a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing&#43;to&#43;Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers<https://cwiki.apache.org/confluence/display/SPARK/Contributing+to&%2343;Spark%23ContributingtoSpark-ContributingbyHelpingOtherUsers>">https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers)

Assaf.



From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]</user/SendEmail.jtp?type=node&node=19916&i=2>[hidden email]<http://user/SendEmail.jtp?type=node&node=19891&i=0>]
Sent: Sunday, November 13, 2016 8:52 AM



To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists







Hey Reynold,






Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>. Anything else we can do to update the Spark Community page / welcome email?





Meanwhile, let's all start answering questions on SO, eh?! :)


Denny



On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email]<http://user/SendEmail.jtp?type=node&node=19835&i=0>> wrote:




That's a good question, looking at http://stackoverflow.com/tags/apache-spark/topusers shows a few contributors who have already been active on SO including some committers and PMC members with very high overall SO reputations for any administrative needs (as well as a number of other contributors besides just PMC/committers).



On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email]<http://user/SendEmail.jtp?type=node&node=19835&i=1>> wrote:



I was just wondering, before we move on to SO.

Do we have enough contributors with enough reputation do manage things in SO?

We would need contributors with enough reputation to have relevant privilages.

For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (10000, this is required to delete questions etc.), protect questions (15000).

All of these are important if we plan to have SO as a main resource.

I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit.

Assaf.





From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=19835&i=2>[hidden email]<http://user/SendEmail.jtp?type=node&node=19800&i=0>]




Sent: Wednesday, November 09, 2016 9:54 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists






Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO.





Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested. The Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags. WDYT?


________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19916.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20006.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Denny Lee <de...@gmail.com>.
Awesome stuff! Thanks Sean! :-)

On Wed, Nov 16, 2016 at 05:57 Sean Owen <so...@cloudera.com> wrote:

> I updated the wiki to point to the /community.html page. (We're going to
> migrate the wiki real soon now anyway)
>
> I updated the /community.html page per this thread too. PR:
> https://github.com/apache/spark-website/pull/16
>
>
> On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson <as...@rsa.com>
> wrote:
>
> Should probably also update the helping others section in the how to
> contribute section (<a href="
> https://cwiki.apache.org/confluence/display/SPARK/Contributing&#43;to&#43;Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers
> <https://cwiki.apache.org/confluence/display/SPARK/Contributing+to&%2343;Spark%23ContributingtoSpark-ContributingbyHelpingOtherUsers>
> ">
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers)
>
>
> Assaf.
>
>
>
> From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http://user/SendEmail.jtp?type=node&node=19891&i=0>]
> Sent: Sunday, November 13, 2016 8:52 AM
>
>
>
> To: Mendelson, Assaf
> Subject: Re: Handling questions in the mailing lists
>
>
>
>
>
>
>
> Hey Reynold,
>
>
>
>
>
>
> Looks like we all of the proposed changes into Proposed Community Mailing
> Lists / StackOverflow Changes
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>.
> Anything else we can do to update the Spark Community page / welcome email?
>
>
>
>
>
> Meanwhile, let's all start answering questions on SO, eh?! :)
>
>
> Denny
>
>
>
> On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19835&i=0>> wrote:
>
>
>
>
> That's a good question, looking at
> http://stackoverflow.com/tags/apache-spark/topusers shows a few
> contributors who have already been active on SO including some committers
> and PMC members with very high overall SO reputations for any
> administrative needs (as well as a number of other contributors besides
> just PMC/committers).
>
>
>
> On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19835&i=1>> wrote:
>
>
>
> I was just wondering, before we move on to SO.
>
> Do we have enough contributors with enough reputation do manage things in
> SO?
>
> We would need contributors with enough reputation to have relevant
> privilages.
>
> For example: creating tags (requires 1500 reputation), edit questions and
> answers (2000), create tag synonums (2500), approve tag wiki edits (5000),
> access to moderator tools (10000, this is required to delete questions
> etc.), protect questions (15000).
>
> All of these are important if we plan to have SO as a main resource.
>
> I know I originally suggested SO, however, if we do not have contributors
> with the required privileges and the willingness to help manage everything
> then I am not sure this is a good fit.
>
> Assaf.
>
>
>
>
>
> From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19835&i=2>[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19800&i=0>]
>
>
>
>
> Sent: Wednesday, November 09, 2016 9:54 AM
> To: Mendelson, Assaf
> Subject: Re: Handling questions in the mailing lists
>
>
>
>
>
>
> Agreed that by simply just moving the questions to SO will not solve
> anything but I think the call out about the meta-tags is that we need to
> abide by SO rules and if we were to just jump in and start creating
> meta-tags, we would be violating at minimum the spirit and at maximum the
> actual conventions around SO.
>
>
>
>
>
> Saying this, perhaps we could suggest tags that we place in the header of
> the question whether it be SO or the mailing lists that will help us sort
> through all of these questions faster just as you suggested. The Proposed
> Community Mailing Lists / StackOverflow Changes
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
> has been updated to include suggested tags. WDYT?
>
>
>

Re: Handling questions in the mailing lists

Posted by Sean Owen <so...@cloudera.com>.
I updated the wiki to point to the /community.html page. (We're going to
migrate the wiki real soon now anyway)

I updated the /community.html page per this thread too. PR:
https://github.com/apache/spark-website/pull/16

On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson <as...@rsa.com>
wrote:

Should probably also update the helping others section in the how to
contribute section (<a href="
https://cwiki.apache.org/confluence/display/SPARK/Contributing&#43;to&#43;Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers
<https://cwiki.apache.org/confluence/display/SPARK/Contributing+to&%2343;Spark%23ContributingtoSpark-ContributingbyHelpingOtherUsers>
">
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers)


Assaf.



From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
email] <http://user/SendEmail.jtp?type=node&node=19891&i=0>]
Sent: Sunday, November 13, 2016 8:52 AM



To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists







Hey Reynold,






Looks like we all of the proposed changes into Proposed Community Mailing
Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>.
Anything else we can do to update the Spark Community page / welcome email?





Meanwhile, let's all start answering questions on SO, eh?! :)


Denny



On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email]
<http://user/SendEmail.jtp?type=node&node=19835&i=0>> wrote:




That's a good question, looking at
http://stackoverflow.com/tags/apache-spark/topusers shows a few
contributors who have already been active on SO including some committers
and PMC members with very high overall SO reputations for any
administrative needs (as well as a number of other contributors besides
just PMC/committers).



On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email]
<http://user/SendEmail.jtp?type=node&node=19835&i=1>> wrote:



I was just wondering, before we move on to SO.

Do we have enough contributors with enough reputation do manage things in
SO?

We would need contributors with enough reputation to have relevant
privilages.

For example: creating tags (requires 1500 reputation), edit questions and
answers (2000), create tag synonums (2500), approve tag wiki edits (5000),
access to moderator tools (10000, this is required to delete questions
etc.), protect questions (15000).

All of these are important if we plan to have SO as a main resource.

I know I originally suggested SO, however, if we do not have contributors
with the required privileges and the willingness to help manage everything
then I am not sure this is a good fit.

Assaf.





From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]
<http://user/SendEmail.jtp?type=node&node=19835&i=2>[hidden email]
<http://user/SendEmail.jtp?type=node&node=19800&i=0>]




Sent: Wednesday, November 09, 2016 9:54 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists






Agreed that by simply just moving the questions to SO will not solve
anything but I think the call out about the meta-tags is that we need to
abide by SO rules and if we were to just jump in and start creating
meta-tags, we would be violating at minimum the spirit and at maximum the
actual conventions around SO.





Saying this, perhaps we could suggest tags that we place in the header of
the question whether it be SO or the mailing lists that will help us sort
through all of these questions faster just as you suggested. The Proposed
Community Mailing Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
has been updated to include suggested tags. WDYT?

RE: Handling questions in the mailing lists

Posted by "assaf.mendelson" <as...@rsa.com>.
Should probably also update the helping others section in the how to contribute section (https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers)
Assaf.

From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19835h74@n3.nabble.com]
Sent: Sunday, November 13, 2016 8:52 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

Hey Reynold,

Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>.  Anything else we can do to update the Spark Community page / welcome email?

Meanwhile, let's all start answering questions on SO, eh?! :)
Denny

On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email]</user/SendEmail.jtp?type=node&node=19835&i=0>> wrote:
That's a good question, looking at http://stackoverflow.com/tags/apache-spark/topusers shows a few contributors who have already been active on SO including some committers and  PMC members with very high overall SO reputations for any administrative needs (as well as a number of other contributors besides just PMC/committers).

On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19835&i=1>> wrote:
I was just wondering, before we move on to SO.
Do we have enough contributors with enough reputation do manage things in SO?
We would need contributors with enough reputation to have relevant privilages.
For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (10000, this is required to delete questions etc.), protect questions (15000).
All of these are important if we plan to have SO as a main resource.
I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit.
Assaf.

From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]</user/SendEmail.jtp?type=node&node=19835&i=2>[hidden email]<http://user/SendEmail.jtp?type=node&node=19800&i=0>]
Sent: Wednesday, November 09, 2016 9:54 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO.

Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested.  The Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags.  WDYT?

On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]<http://user/SendEmail.jtp?type=node&node=19799&i=0>> wrote:
I like the document and I think it is good but I still feel like we are missing an important part here.

Look at SO today. There are:

-           4658 unanswered questions under apache-spark tag.

-          394 unanswered questions under spark-dataframe tag.

-          639 unanswered questions under apache-spark-sql

-          859 unanswered questions under pyspark

Just moving people to ask there will not help. The whole issue is having people answer the questions.

The problem is that many of these questions do not fit SO (but are already there so they are noise), are bad (i.e. unclear or hard to answer), orphaned etc. while some are simply harder than what people with some experience in spark can handle and require more expertise.
The problem is that people with the relevant expertise are drowning in noise. This. Is true for the mailing list and this is true for SO.

For this reason I believe that just moving people to SO will not solve anything.

My original thought was that if we had different tags then different people could watch open questions on these tags and therefore have a much lower noise. I thought that we would have a low tier (current one) of people just not following the documentation (which would remain as noise), then a beginner tier where we could have people downvoting bad questions but in most cases the community can answer the questions because they are common, then a “medium” tier which would mean harder questions but that can still be answered by advanced users and lastly an “advanced” tier to which committers can actually subscribed to (and adding sub tags for subsystems would improve this even more).

I was not aware of SO policy for meta tags (the burnination link is about removing tags completely so I am not sure how it applies, I believe this link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more relevant).
There was actually a discussion along the lines in SO (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level).

The fact that SO did not solve this issue, does not mean we shouldn’t either.

The way I see it, some tags can easily be used even with the meta tags limitation. For example, using spark-internal-development tag can be used to ask questions for development of spark. There are already tags for some spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a spark-streaming tag etc.). The main issue I see and the one we can’t seem to get around is dividing between simple questions that the community should answer and hard questions which only advanced users can answer.

Maybe SO isn’t the correct platform for that but even within it we can try to find a non meta name for spark beginner questions vs. spark advanced questions.
Assaf.


From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=19799&i=1>[hidden email]<http://user/SendEmail.jtp?type=node&node=19798&i=0>]
Sent: Tuesday, November 08, 2016 7:53 AM
To: Mendelson, Assaf

Subject: Re: Handling questions in the mailing lists

To help track and get the verbiage for the Spark community page and welcome email jump started, here's a working document for us to work with: https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>

Hope this will help us collaborate on this stuff a little faster.
On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:

Just a couple of random thoughts regarding Stack Overflow...

  *   If we are thinking about shifting focus towards SO all attempts of micromanaging should be discarded right in the beginning. Especially things like meta tags, which are discouraged and "burninated" (https://meta.stackoverflow.com/tags/burninate-request/info) , or thread bumping. Depending on a context these won't be manageable, go against community guidelines or simply obsolete.
  *   Lack of expertise is unlikely an issue. Even now there is a number of advanced Spark users on SO. Of course the more the merrier.

Things that can be easily improved:

  *   Identifying, improving and promoting canonical questions and answers. It means closing duplicate, suggesting edits to improve existing answers, providing alternative solutions. This can be also used to identify gaps in the documentation.
  *   Providing a set of clear posting guidelines to reduce effort required to identify the problem (think about http://stackoverflow.com/q/5963269 a.k.a How to make a great R reproducible example?)
  *   Helping users decide if question is a good fit for SO (see below). API questions are great fit, debugging problems like "my cluster is slow" are not.
  *   Actively cleaning (closing, deleting) off-topic and low quality questions. The less junk to sieve through the better chance of good questions being answered.
  *   Repurposing and actively moderating SO docs (https://stackoverflow.com/documentation/apache-spark/topics). Right now most of the stuff that goes there is useless, duplicated or plagiarized, or border case SPAM.
  *   Encouraging community to monitor featured (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured) and active & upvoted & unanswered (https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
  *   Implementing some procedure to identify questions which are likely to be bugs or a material for feature requests. Personally I am quite often tempted to simply send a link to dev list, but I don't think it is really acceptable.
  *   Animating Spark related chat room. I tried this a couple of times but to no avail. Without a certain critical mass of users it just won't work.



On 11/07/2016 07:32 AM, Reynold Xin wrote:
This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?

Sure, I'll be happy to help if I can.
On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:

Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).

Believe me, I've seen this before.
On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing lists too :)
On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:

You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...

--

Best,

Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public archive).

It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.


On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:

We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:
•         https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
•         https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>

(It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)

Nick
​
On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?
On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=6>> wrote:
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=7>> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=8>>

> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=9>]

>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; [hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=10>

>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=11>>

>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=12>







--

Maciej Szymkiewicz

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
To start a new topic under Apache Spark Developers List, email [hidden email]<http://user/SendEmail.jtp?type=node&node=19798&i=1>
To unsubscribe from Apache Spark Developers List, click here.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

________________________________
View this message in context: RE: Handling questions in the mailing lists<http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
Sent from the Apache Spark Developers List mailing list archive<http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19799.html
To start a new topic under Apache Spark Developers List, email [hidden email]<http://user/SendEmail.jtp?type=node&node=19800&i=1>
To unsubscribe from Apache Spark Developers List, click here.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

________________________________
View this message in context: RE: Handling questions in the mailing lists<http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19800.html>
Sent from the Apache Spark Developers List mailing list archive<http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.



--
Cell : <a href="tel:(425)%20233-8271" value="+14252338271" class="gmail_msg" target="_blank">425-233-8271
Twitter: https://twitter.com/holdenkarau

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19835.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19891.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
If you take a look at the statistics
(https://data.stackexchange.com/stackoverflow/query/575406) you'll see
that majority of the unanswered questions:

  * have seen no activity in the last year OR
  * don't have positive score OR
  * have been asked by inactive or new users.

This is usually a good indicator that question is poor quality and / or
abandoned and for different reasons hasn't been picked by the removal
process (https://stackoverflow.com/help/roomba). This is not unusual for
Stack Overflow and with a little bit of organized effort could be
cleaned in a few weeks.

Arguably, for a technology with a large number of moving parts, Spark
has pretty decent /answer rate/ and definitely better than many
comparable projects.

Regarding tagging. Putting community rules aside clean questions which
can be answered with relatively low effort are usually resolved in a few
days. What is left is either to time consuming or complex or just not
not worth the time. If you have a lot of time the former ones can be
easily selected using predefined filters and the rest usually qualifies
for closing.

Still, I believe there is a really important missing point here. All of
that requires a lot of effort and it is slightly unrealistic to expect 
that the number of people willing and having time to contribute will
suddenly grow. So the focus should be on having a knowledge base which
can reduce number of questions to be answered. SO has good visibility,
large number of existing answers, and very good tools. 

On 11/09/2016 08:02 AM, assaf.mendelson wrote:
>
> I like the document and I think it is good but I still feel like we
> are missing an important part here.
>
>  
>
> Look at SO today. There are:
>
> -           4658 unanswered questions under apache-spark tag.
>
> -          394 unanswered questions under spark-dataframe tag.
>
> -          639 unanswered questions under apache-spark-sql
>
> -          859 unanswered questions under pyspark
>
>  
>
> Just moving people to ask there will not help. The whole issue is
> having people answer the questions.
>
>  
>
> The problem is that many of these questions do not fit SO (but are
> already there so they are noise), are bad (i.e. unclear or hard to
> answer), orphaned etc. while some are simply harder than what people
> with some experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>  
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>  
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a
> much lower noise. I thought that we would have a low tier (current
> one) of people just not following the documentation (which would
> remain as noise), then a beginner tier where we could have people
> downvoting bad questions but in most cases the community can answer
> the questions because they are common, then a \u201cmedium\u201d tier which
> would mean harder questions but that can still be answered by advanced
> users and lastly an \u201cadvanced\u201d tier to which committers can actually
> subscribed to (and adding sub tags for subsystems would improve this
> even more).
>
>  
>
> I was not aware of SO policy for meta tags (the burnination link is
> about removing tags completely so I am not sure how it applies, I
> believe this link
> https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO
> (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level).
>
>  
>
> The fact that SO did not solve this issue, does not mean we shouldn\u2019t
> either.
>
>  
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be
> used to ask questions for development of spark. There are already tags
> for some spark subsystems (there is a apachae-spark-sql tag, a pyspark
> tag, a spark-streaming tag etc.). The main issue I see and the one we
> can\u2019t seem to get around is dividing between simple questions that the
> community should answer and hard questions which only advanced users
> can answer.
>
>  
>
> Maybe SO isn\u2019t the correct platform for that but even within it we can
> try to find a non meta name for spark beginner questions vs. spark
> advanced questions.
>
> Assaf.
>
>  
>
>  
>
> *From:*Denny Lee [via Apache Spark Developers List]
> [mailto:ml-node+[hidden email]
> </user/SendEmail.jtp?type=node&node=19798&i=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>  
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work
> with: https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>  
>
> Hope this will help us collaborate on this stuff a little faster.  
>
>  
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> </user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:
>
>     Just a couple of random thoughts regarding Stack Overflow...
>
>       * If we are thinking about shifting focus towards SO all
>         attempts of micromanaging should be discarded right in the
>         beginning. Especially things like meta tags, which are
>         discouraged and "burninated"
>         (https://meta.stackoverflow.com/tags/burninate-request/info) ,
>         or thread bumping. Depending on a context these won't be
>         manageable, go against community guidelines or simply obsolete. 
>       * Lack of expertise is unlikely an issue. Even now there is a
>         number of advanced Spark users on SO. Of course the more the
>         merrier.
>
>     Things that can be easily improved:
>
>       * Identifying, improving and promoting canonical questions and
>         answers. It means closing duplicate, suggesting edits to
>         improve existing answers, providing alternative solutions.
>         This can be also used to identify gaps in the documentation.
>       * Providing a set of clear posting guidelines to reduce effort
>         required to identify the problem (think about
>         http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>         reproducible example?)
>       * Helping users decide if question is a good fit for SO (see
>         below). API questions are great fit, debugging problems like
>         "my cluster is slow" are not.
>       * Actively cleaning (closing, deleting) off-topic and low
>         quality questions. The less junk to sieve through the better
>         chance of good questions being answered.
>       * Repurposing and actively moderating SO docs
>         (https://stackoverflow.com/documentation/apache-spark/topics).
>         Right now most of the stuff that goes there is useless,
>         duplicated or plagiarized, or border case SPAM.
>       * Encouraging community to monitor featured
>         (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>         and active & upvoted & unanswered
>         (https://stackoverflow.com/unanswered/tagged/apache-spark)
>         questions.
>       * Implementing some procedure to identify questions which are
>         likely to be bugs or a material for feature requests.
>         Personally I am quite often tempted to simply send a link to
>         dev list, but I don't think it is really acceptable.
>       * Animating Spark related chat room. I tried this a couple of
>         times but to no avail. Without a certain critical mass of
>         users it just won't work.
>
>      
>
>      
>
>     On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
>         This is an excellent point. If we do go ahead and feature SO
>         as a way for users to ask questions more prominently, as
>         someone who knows SO very well, would you be willing to help
>         write a short guideline (ideally the shorter the better, which
>         makes it hard) to direct what goes to user@ and what goes to SO?
>
>      
>
>     Sure, I'll be happy to help if I can.
>
>
>
>
>      
>
>      
>
>     On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]
>     </user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:
>
>     Damn, I always thought that mailing list is only for nice and
>     welcoming people and there is nothing to do for me here >:)
>
>     To be serious though, there are many questions on the users list
>     which would fit just fine on SO but it is not true in general.
>     There are dozens of questions which are to broad, opinion based,
>     ask for external resources and so on. If you want to direct users
>     to SO you have to help them to decide if it is the right channel.
>     Otherwise it will just create a really bad experience for both
>     seeking help and active answerers. Former ones will be downvoted
>     and bashed, latter ones will have to deal with handling all the
>     junk and the number of active Spark users with moderation
>     privileges is really low (with only Massg and me being able to
>     directly close duplicates).
>
>     Believe me, I've seen this before.
>
>     On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
>         You have substantially underestimated how opinionated people
>         can be on mailing lists too :)
>
>         On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden
>         email] </user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:
>
>         You have to remember that Stack Overflow crowd (like me) is
>         highly opinionated, so many questions, which could be just
>         fine on the mailing list, will be quickly downvoted and / or
>         closed as off-topic. Just saying...
>
>         -- 
>
>         Best, 
>
>         Maciej
>
>          
>
>         On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
>             OK I've checked on the ASF member list (which is private
>             so there is no public archive).
>
>              
>
>             It is not against any ASF rule to recommend StackOverflow
>             as a place for users to ask questions. I don't think we
>             can or should delete the existing user@spark list either,
>             but we can certainly make SO more visible than it is.
>
>              
>
>              
>
>              
>
>             On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden
>             email] </user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
>
>             Actually after talking with more ASF members, I believe
>             the only policy is that development decisions have to be
>             made and announced on ASF properties (dev list or jira),
>             but user questions don't have to. 
>
>              
>
>             I'm going to double check this. If it is true, I would
>             actually recommend us moving entirely over the Q&A part of
>             the user list to stackoverflow, or at least make that the
>             recommended way rather than the existing user list which
>             is not very scalable. 
>
>
>
>             On Wednesday, November 2, 2016, Nicholas Chammas <[hidden
>             email] </user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:
>
>             We\u2019ve discussed several times upgrading our communication
>             tools, as far back as 2014 and maybe even before that too.
>             The bottom line is that we can\u2019t due to ASF rules
>             requiring the use of ASF-managed mailing lists.
>
>             For some history, see this discussion:
>
>             �        
>             https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E
>             <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>
>             �        
>             https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E
>             <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
>             (It\u2019s ironic that it\u2019s difficult to follow the past
>             discussion on why we can\u2019t change our official
>             communication tools due to those very tools\u2026)
>
>             Nick
>
>             \u200b
>
>              
>
>             On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden
>             email] </user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
>
>                 I fell Assaf point is quite relevant if we want to
>                 move this project forward from the Spark user
>                 perspective (as I do). In fact, we're still using 20th
>                 century tools (mailing lists) with some add-ons (like
>                 Stack Overflow).
>
>                  
>
>                 As usually, Sean and Cody's contributions are very to
>                 the point.
>
>                 I fell it is indeed a matter of of culture (hard to
>                 enforce) and tools (much easier). Isn't it?
>
>                  
>
>                 On 2 November 2016 at 16:36, Cody Koeninger <[hidden
>                 email] </user/SendEmail.jtp?type=node&node=19770&i=6>>
>                 wrote:
>
>                 So concrete things people could do
>
>                 - users could tag subject lines appropriately to the
>                 component they're
>                 asking about
>
>                 - contributors could monitor user@ for tags relating
>                 to components
>                 they've worked on.
>                 I'd be surprised if my miss rate for any mailing list
>                 questions
>                 well-labeled as Kafka was higher than 5%
>
>                 - committers could be more aggressive about soliciting
>                 and merging PRs
>                 to improve documentation.
>                 It's a lot easier to answer even poorly-asked
>                 questions with a link to
>                 relevant docs.
>
>
>                 On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden
>                 email] </user/SendEmail.jtp?type=node&node=19770&i=7>>
>                 wrote:
>                 > There's already reviews@ and issues@. dev@ is for
>                 project development itself
>                 > and I think is OK. You're suggesting splitting up
>                 user@ and I sympathize
>                 > with the motivation. Experience tells me that we'll
>                 have a beginner@ that's
>                 > then totally ignored, and people will quickly learn
>                 to post to advanced@ to
>                 > get attention, and we'll be back where we started.
>                 Putting it in JIRA
>                 > doesn't help. I don't think this a problem that is
>                 merely down to lack of
>                 > process. It actually requires cultivating a culture
>                 change on the community
>                 > list.
>                 >
>                 > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf
>                 <[hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=8>>
>                 > wrote:
>                 >>
>                 >> What I am suggesting is basically to fix that.
>                 >>
>                 >> For example, we might say that mailing list A is
>                 only for voting, mailing
>                 >> list B is only for PR and have something like stack
>                 overflow for developer
>                 >> questions (I would even go as far as to have
>                 beginner, intermediate and
>                 >> advanced mailing list for users and
>                 beginner/advanced for dev).
>                 >>
>                 >>
>                 >>
>                 >> This can easily be done using stack overflow tags,
>                 however, that would
>                 >> probably be harder to manage.
>                 >>
>                 >> Maybe using special jira tags and manage it in jira?
>                 >>
>                 >>
>                 >>
>                 >> Anyway as I said, the main issue is not user
>                 questions (except maybe
>                 >> advanced ones) but more for dev questions. It is so
>                 easy to get lost in the
>                 >> chatter that it makes it very hard for people to
>                 learn spark internals\u2026
>                 >>
>                 >> Assaf.
>                 >>
>                 >>
>                 >>
>                 >> From: Sean Owen [mailto:[hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=9>]
>                 >> Sent: Wednesday, November 02, 2016 2:07 PM
>                 >> To: Mendelson, Assaf; [hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=10>
>                 >> Subject: Re: Handling questions in the mailing lists
>                 >>
>                 >>
>                 >>
>                 >> I think that unfortunately mailing lists don't
>                 scale well. This one has
>                 >> thousands of subscribers with different interests
>                 and levels of experience.
>                 >> For any given person, most messages will be
>                 irrelevant. I also find that a
>                 >> lot of questions on user@ are not well-asked,
>                 aren't an SSCCE
>                 >> (http://sscce.org/), not something most people are
>                 going to bother replying
>                 >> to even if they could answer. I almost entirely
>                 ignore user@ because there
>                 >> are higher-priority channels like PRs to deal with,
>                 that already have
>                 >> hundreds of messages per day. This is why little of
>                 it gets an answer -- too
>                 >> noisy.
>                 >>
>                 >>
>                 >>
>                 >> We have to have official mailing lists, in any
>                 event, to have some
>                 >> official channel for things like votes and
>                 announcements. It's not wrong to
>                 >> ask questions on user@ of course, but a lot of the
>                 questions I see could
>                 >> have been answered with research of existing docs
>                 or looking at the code. I
>                 >> think that given the scale of the list, it's not
>                 wrong to assert that this
>                 >> is sort of a prerequisite for asking thousands of
>                 people to answer one's
>                 >> question. But we can't enforce that.
>                 >>
>                 >>
>                 >>
>                 >> The situation will get better to the extent people
>                 ask better questions,
>                 >> help other people ask better questions, and answer
>                 good questions. I'd
>                 >> encourage anyone feeling this way to try to help
>                 along those dimensions.
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson
>                 <[hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=11>>
>                 >> wrote:
>                 >>
>                 >> Hi,
>                 >>
>                 >> I know this is a little off topic but I wanted to
>                 raise an issue about
>                 >> handling questions in the mailing list (this is
>                 true both for the user
>                 >> mailing list and the dev but since there are other
>                 options such as stack
>                 >> overflow for user questions, this is more
>                 problematic in dev).
>                 >>
>                 >> Let\u2019s say I ask a question (as I recently did).
>                 Unfortunately this was
>                 >> during spark summit in Europe so probably people
>                 were busy. In any case no
>                 >> one answered.
>                 >>
>                 >> The problem is, that if no one answers very soon,
>                 the question will almost
>                 >> certainly remain unanswered because new messages
>                 will simply drown it.
>                 >>
>                 >>
>                 >>
>                 >> This is a common issue not just for questions but
>                 for any comment or idea
>                 >> which is not immediately picked up.
>                 >>
>                 >>
>                 >>
>                 >> I believe we should have a method of handling this.
>                 >>
>                 >> Generally, I would say these types of things belong
>                 in stack overflow,
>                 >> after all, the way it is built is perfect for this.
>                 More seasoned spark
>                 >> contributors and committers can periodically check
>                 out unanswered questions
>                 >> and answer them.
>                 >>
>                 >> The problem is that stack overflow (as well as
>                 other targets such as the
>                 >> databricks forums) tend to have a more user based
>                 orientation. This means
>                 >> that any spark internal question will almost
>                 certainly remain unanswered.
>                 >>
>                 >>
>                 >>
>                 >> I was wondering if we could come up with a solution
>                 for this.
>                 >>
>                 >>
>                 >>
>                 >> Assaf.
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >> ________________________________
>                 >>
>                 >> View this message in context: Handling questions in
>                 the mailing lists
>                 >> Sent from the Apache Spark Developers List mailing
>                 list archive at
>                 >> Nabble.com.
>
>                 ---------------------------------------------------------------------
>                 To unsubscribe e-mail: [hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=12>
>
>                  
>
>              
>
>          
>
>      
>
>      
>
>      
>
>     -- 
>
>     Maciej Szymkiewicz
>
>  
>
> ------------------------------------------------------------------------
>
> *If you reply to this email, your message will be added to the
> discussion below:*
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
>
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] </user/SendEmail.jtp?type=node&node=19798&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
> ------------------------------------------------------------------------
> View this message in context: RE: Handling questions in the mailing
> lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.

-- 
Maciej Szymkiewicz


Re: Handling questions in the mailing lists

Posted by Denny Lee <de...@gmail.com>.
Hey Reynold,

Looks like we all of the proposed changes into Proposed Community Mailing
Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>.
Anything else we can do to update the Spark Community page / welcome email?


Meanwhile, let's all start answering questions on SO, eh?! :)
Denny

On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <ho...@pigscanfly.ca> wrote:

> That's a good question, looking at
> http://stackoverflow.com/tags/apache-spark/topusers shows a few
> contributors who have already been active on SO including some committers
> and  PMC members with very high overall SO reputations for any
> administrative needs (as well as a number of other contributors besides
> just PMC/committers).
>
> On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <as...@rsa.com>
> wrote:
>
> I was just wondering, before we move on to SO.
>
> Do we have enough contributors with enough reputation do manage things in
> SO?
>
> We would need contributors with enough reputation to have relevant
> privilages.
>
> For example: creating tags (requires 1500 reputation), edit questions and
> answers (2000), create tag synonums (2500), approve tag wiki edits (5000),
> access to moderator tools (10000, this is required to delete questions
> etc.), protect questions (15000).
>
> All of these are important if we plan to have SO as a main resource.
>
> I know I originally suggested SO, however, if we do not have contributors
> with the required privileges and the willingness to help manage everything
> then I am not sure this is a good fit.
>
> Assaf.
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19800&i=0>]
> *Sent:* Wednesday, November 09, 2016 9:54 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> Agreed that by simply just moving the questions to SO will not solve
> anything but I think the call out about the meta-tags is that we need to
> abide by SO rules and if we were to just jump in and start creating
> meta-tags, we would be violating at minimum the spirit and at maximum the
> actual conventions around SO.
>
>
>
> Saying this, perhaps we could suggest tags that we place in the header of
> the question whether it be SO or the mailing lists that will help us sort
> through all of these questions faster just as you suggested.  The Proposed
> Community Mailing Lists / StackOverflow Changes
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has
> been updated to include suggested tags.  WDYT?
>
>
>
> On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19799&i=0>> wrote:
>
> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -           4658 unanswered questions under apache-spark tag.
>
> -          394 unanswered questions under spark-dataframe tag.
>
> -          639 unanswered questions under apache-spark-sql
>
> -          859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for meta tags (the burnination link is about
> removing tags completely so I am not sure how it applies, I believe this
> link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO (
> http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level
> ).
>
>
>
> The fact that SO did not solve this issue, does not mean we shouldn’t
> either.
>
>
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be used
> to ask questions for development of spark. There are already tags for some
> spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a
> spark-streaming tag etc.). The main issue I see and the one we can’t seem
> to get around is dividing between simple questions that the community
> should answer and hard questions which only advanced users can answer.
>
>
>
> Maybe SO isn’t the correct platform for that but even within it we can try
> to find a non meta name for spark beginner questions vs. spark advanced
> questions.
>
> Assaf.
>
>
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19799&i=1>[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19798&i=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
>
>
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work with:
> https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>
>
> Hope this will help us collaborate on this stuff a little faster.
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:
>
> Just a couple of random thoughts regarding Stack Overflow...
>
>    - If we are thinking about shifting focus towards SO all attempts of
>    micromanaging should be discarded right in the beginning. Especially things
>    like meta tags, which are discouraged and "burninated" (
>    https://meta.stackoverflow.com/tags/burninate-request/info) , or
>    thread bumping. Depending on a context these won't be manageable, go
>    against community guidelines or simply obsolete.
>    - Lack of expertise is unlikely an issue. Even now there is a number
>    of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>    - Identifying, improving and promoting canonical questions and
>    answers. It means closing duplicate, suggesting edits to improve existing
>    answers, providing alternative solutions. This can be also used to identify
>    gaps in the documentation.
>    - Providing a set of clear posting guidelines to reduce effort
>    required to identify the problem (think about
>    http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>    reproducible example?)
>    - Helping users decide if question is a good fit for SO (see below).
>    API questions are great fit, debugging problems like "my cluster is slow"
>    are not.
>    - Actively cleaning (closing, deleting) off-topic and low quality
>    questions. The less junk to sieve through the better chance of good
>    questions being answered.
>    - Repurposing and actively moderating SO docs (
>    https://stackoverflow.com/documentation/apache-spark/topics). Right
>    now most of the stuff that goes there is useless, duplicated or
>    plagiarized, or border case SPAM.
>    - Encouraging community to monitor featured (
>    https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>    and active & upvoted & unanswered (
>    https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
>    - Implementing some procedure to identify questions which are likely
>    to be bugs or a material for feature requests. Personally I am quite often
>    tempted to simply send a link to dev list, but I don't think it is really
>    acceptable.
>    - Animating Spark related chat room. I tried this a couple of times
>    but to no avail. Without a certain critical mass of users it just won't
>    work.
>
>
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
>
> Sure, I'll be happy to help if I can.
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
>
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
>
> Best,
>
> Maciej
>
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
>
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
>
> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
>
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:
>
> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
> ·
> https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E
> <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>
> ·
> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E
> <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
>
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
>
> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
>
>
> As usually, Sean and Cody's contributions are very to the point.
>
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=6>> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=7>> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
>
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=8>>
>
>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
>
> >> From: Sean Owen [mailto:[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=9>]
>
>
> >> Sent: Wednesday, November 02, 2016 2:07 PM
>
> >> To: Mendelson, Assaf; [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=10>
>
>
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=11>>
>
>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=12>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Maciej Szymkiewicz
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http://user/SendEmail.jtp?type=node&node=19798&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
> ------------------------------
>
> View this message in context: RE: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19799.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19800&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> ------------------------------
> View this message in context: RE: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19800.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Re: Handling questions in the mailing lists

Posted by Holden Karau <ho...@pigscanfly.ca>.
That's a good question, looking at
http://stackoverflow.com/tags/apache-spark/topusers shows a few
contributors who have already been active on SO including some committers
and  PMC members with very high overall SO reputations for any
administrative needs (as well as a number of other contributors besides
just PMC/committers).

On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <as...@rsa.com>
wrote:

> I was just wondering, before we move on to SO.
>
> Do we have enough contributors with enough reputation do manage things in
> SO?
>
> We would need contributors with enough reputation to have relevant
> privilages.
>
> For example: creating tags (requires 1500 reputation), edit questions and
> answers (2000), create tag synonums (2500), approve tag wiki edits (5000),
> access to moderator tools (10000, this is required to delete questions
> etc.), protect questions (15000).
>
> All of these are important if we plan to have SO as a main resource.
>
> I know I originally suggested SO, however, if we do not have contributors
> with the required privileges and the willingness to help manage everything
> then I am not sure this is a good fit.
>
> Assaf.
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19800&i=0>]
> *Sent:* Wednesday, November 09, 2016 9:54 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> Agreed that by simply just moving the questions to SO will not solve
> anything but I think the call out about the meta-tags is that we need to
> abide by SO rules and if we were to just jump in and start creating
> meta-tags, we would be violating at minimum the spirit and at maximum the
> actual conventions around SO.
>
>
>
> Saying this, perhaps we could suggest tags that we place in the header of
> the question whether it be SO or the mailing lists that will help us sort
> through all of these questions faster just as you suggested.  The Proposed
> Community Mailing Lists / StackOverflow Changes
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has
> been updated to include suggested tags.  WDYT?
>
>
>
> On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19799&i=0>> wrote:
>
> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -           4658 unanswered questions under apache-spark tag.
>
> -          394 unanswered questions under spark-dataframe tag.
>
> -          639 unanswered questions under apache-spark-sql
>
> -          859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for meta tags (the burnination link is about
> removing tags completely so I am not sure how it applies, I believe this
> link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO (
> http://meta.stackoverflow.com/questions/253338/filtering-questions-by-
> difficulty-level).
>
>
>
> The fact that SO did not solve this issue, does not mean we shouldn’t
> either.
>
>
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be used
> to ask questions for development of spark. There are already tags for some
> spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a
> spark-streaming tag etc.). The main issue I see and the one we can’t seem
> to get around is dividing between simple questions that the community
> should answer and hard questions which only advanced users can answer.
>
>
>
> Maybe SO isn’t the correct platform for that but even within it we can try
> to find a non meta name for spark beginner questions vs. spark advanced
> questions.
>
> Assaf.
>
>
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19799&i=1>[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19798&i=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
>
>
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work with:
> https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIo
> acZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>
>
> Hope this will help us collaborate on this stuff a little faster.
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:
>
> Just a couple of random thoughts regarding Stack Overflow...
>
>    - If we are thinking about shifting focus towards SO all attempts of
>    micromanaging should be discarded right in the beginning. Especially things
>    like meta tags, which are discouraged and "burninated" (
>    https://meta.stackoverflow.com/tags/burninate-request/info
>    <https://meta.stackoverflow.com/tags/burninate-request/info>) , or
>    thread bumping. Depending on a context these won't be manageable, go
>    against community guidelines or simply obsolete.
>    - Lack of expertise is unlikely an issue. Even now there is a number
>    of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>    - Identifying, improving and promoting canonical questions and
>    answers. It means closing duplicate, suggesting edits to improve existing
>    answers, providing alternative solutions. This can be also used to identify
>    gaps in the documentation.
>    - Providing a set of clear posting guidelines to reduce effort
>    required to identify the problem (think about
>    http://stackoverflow.com/q/5963269 <http://stackoverflow.com/q/5963269>
>    a.k.a How to make a great R reproducible example?)
>    - Helping users decide if question is a good fit for SO (see below).
>    API questions are great fit, debugging problems like "my cluster is slow"
>    are not.
>    - Actively cleaning (closing, deleting) off-topic and low quality
>    questions. The less junk to sieve through the better chance of good
>    questions being answered.
>    - Repurposing and actively moderating SO docs (
>    https://stackoverflow.com/documentation/apache-spark/topics
>    <https://stackoverflow.com/documentation/apache-spark/topics>). Right
>    now most of the stuff that goes there is useless, duplicated or
>    plagiarized, or border case SPAM.
>    - Encouraging community to monitor featured (https://stackoverflow.com/
>    questions/tagged/apache-spark?sort=featured
>    <https://stackoverflow.com/questions/tagged/apache-spark?sort=featured>)
>    and active & upvoted & unanswered (https://stackoverflow.com/
>    unanswered/tagged/apache-spark) questions.
>    - Implementing some procedure to identify questions which are likely
>    to be bugs or a material for feature requests. Personally I am quite often
>    tempted to simply send a link to dev list, but I don't think it is really
>    acceptable.
>    - Animating Spark related chat room. I tried this a couple of times
>    but to no avail. Without a certain critical mass of users it just won't
>    work.
>
>
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
>
> Sure, I'll be happy to help if I can.
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
>
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
>
> Best,
>
> Maciej
>
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
>
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
>
> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
>
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:
>
> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
> ·         https://mail-archives.apache.org/mod_mbox/spark-user/
> 201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E
> <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>
> ·         https://mail-archives.apache.org/mod_mbox/spark-user/
> 201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E
> <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
>
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
>
> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
>
>
> As usually, Sean and Cody's contributions are very to the point.
>
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=6>> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=7>> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
>
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=8>>
>
>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
>
> >> From: Sean Owen [mailto:[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=9>]
>
>
> >> Sent: Wednesday, November 02, 2016 2:07 PM
>
> >> To: Mendelson, Assaf; [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=10>
>
>
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=11>>
>
>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19770&i=12>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Maciej Szymkiewicz
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.
> nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http://user/SendEmail.jtp?type=node&node=19798&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
> ------------------------------
>
> View this message in context: RE: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.
> nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19799.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19800&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> ------------------------------
> View this message in context: RE: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19800.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

RE: Handling questions in the mailing lists

Posted by "assaf.mendelson" <as...@rsa.com>.
I was just wondering, before we move on to SO.
Do we have enough contributors with enough reputation do manage things in SO?
We would need contributors with enough reputation to have relevant privilages.
For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (10000, this is required to delete questions etc.), protect questions (15000).
All of these are important if we plan to have SO as a main resource.
I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit.
Assaf.

From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19799h58@n3.nabble.com]
Sent: Wednesday, November 09, 2016 9:54 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO.

Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested.  The Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags.  WDYT?

On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19799&i=0>> wrote:
I like the document and I think it is good but I still feel like we are missing an important part here.

Look at SO today. There are:

-           4658 unanswered questions under apache-spark tag.

-          394 unanswered questions under spark-dataframe tag.

-          639 unanswered questions under apache-spark-sql

-          859 unanswered questions under pyspark

Just moving people to ask there will not help. The whole issue is having people answer the questions.

The problem is that many of these questions do not fit SO (but are already there so they are noise), are bad (i.e. unclear or hard to answer), orphaned etc. while some are simply harder than what people with some experience in spark can handle and require more expertise.
The problem is that people with the relevant expertise are drowning in noise. This. Is true for the mailing list and this is true for SO.

For this reason I believe that just moving people to SO will not solve anything.

My original thought was that if we had different tags then different people could watch open questions on these tags and therefore have a much lower noise. I thought that we would have a low tier (current one) of people just not following the documentation (which would remain as noise), then a beginner tier where we could have people downvoting bad questions but in most cases the community can answer the questions because they are common, then a “medium” tier which would mean harder questions but that can still be answered by advanced users and lastly an “advanced” tier to which committers can actually subscribed to (and adding sub tags for subsystems would improve this even more).

I was not aware of SO policy for meta tags (the burnination link is about removing tags completely so I am not sure how it applies, I believe this link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more relevant).
There was actually a discussion along the lines in SO (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level).

The fact that SO did not solve this issue, does not mean we shouldn’t either.

The way I see it, some tags can easily be used even with the meta tags limitation. For example, using spark-internal-development tag can be used to ask questions for development of spark. There are already tags for some spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a spark-streaming tag etc.). The main issue I see and the one we can’t seem to get around is dividing between simple questions that the community should answer and hard questions which only advanced users can answer.

Maybe SO isn’t the correct platform for that but even within it we can try to find a non meta name for spark beginner questions vs. spark advanced questions.
Assaf.


From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]</user/SendEmail.jtp?type=node&node=19799&i=1>[hidden email]<http://user/SendEmail.jtp?type=node&node=19798&i=0>]
Sent: Tuesday, November 08, 2016 7:53 AM
To: Mendelson, Assaf

Subject: Re: Handling questions in the mailing lists

To help track and get the verbiage for the Spark community page and welcome email jump started, here's a working document for us to work with: https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>

Hope this will help us collaborate on this stuff a little faster.
On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:

Just a couple of random thoughts regarding Stack Overflow...

  *   If we are thinking about shifting focus towards SO all attempts of micromanaging should be discarded right in the beginning. Especially things like meta tags, which are discouraged and "burninated" (https://meta.stackoverflow.com/tags/burninate-request/info) , or thread bumping. Depending on a context these won't be manageable, go against community guidelines or simply obsolete.
  *   Lack of expertise is unlikely an issue. Even now there is a number of advanced Spark users on SO. Of course the more the merrier.

Things that can be easily improved:

  *   Identifying, improving and promoting canonical questions and answers. It means closing duplicate, suggesting edits to improve existing answers, providing alternative solutions. This can be also used to identify gaps in the documentation.
  *   Providing a set of clear posting guidelines to reduce effort required to identify the problem (think about http://stackoverflow.com/q/5963269 a.k.a How to make a great R reproducible example?)
  *   Helping users decide if question is a good fit for SO (see below). API questions are great fit, debugging problems like "my cluster is slow" are not.
  *   Actively cleaning (closing, deleting) off-topic and low quality questions. The less junk to sieve through the better chance of good questions being answered.
  *   Repurposing and actively moderating SO docs (https://stackoverflow.com/documentation/apache-spark/topics). Right now most of the stuff that goes there is useless, duplicated or plagiarized, or border case SPAM.
  *   Encouraging community to monitor featured (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured) and active & upvoted & unanswered (https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
  *   Implementing some procedure to identify questions which are likely to be bugs or a material for feature requests. Personally I am quite often tempted to simply send a link to dev list, but I don't think it is really acceptable.
  *   Animating Spark related chat room. I tried this a couple of times but to no avail. Without a certain critical mass of users it just won't work.



On 11/07/2016 07:32 AM, Reynold Xin wrote:
This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?

Sure, I'll be happy to help if I can.
On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:

Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).

Believe me, I've seen this before.
On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing lists too :)
On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:

You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...

--

Best,

Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public archive).

It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.


On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:

We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:
•         https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
•         https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>

(It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)

Nick
​
On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?
On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=6>> wrote:
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=7>> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=8>>

> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=9>]

>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; [hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=10>

>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=11>>

>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]<http://user/SendEmail.jtp?type=node&node=19770&i=12>







--

Maciej Szymkiewicz

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
To start a new topic under Apache Spark Developers List, email [hidden email]<http://user/SendEmail.jtp?type=node&node=19798&i=1>
To unsubscribe from Apache Spark Developers List, click here.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

________________________________
View this message in context: RE: Handling questions in the mailing lists<http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
Sent from the Apache Spark Developers List mailing list archive<http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19799.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19800.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Denny Lee <de...@gmail.com>.
Agreed that by simply just moving the questions to SO will not solve
anything but I think the call out about the meta-tags is that we need to
abide by SO rules and if we were to just jump in and start creating
meta-tags, we would be violating at minimum the spirit and at maximum the
actual conventions around SO.

Saying this, perhaps we could suggest tags that we place in the header of
the question whether it be SO or the mailing lists that will help us sort
through all of these questions faster just as you suggested.  The Proposed
Community Mailing Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>
has
been updated to include suggested tags.  WDYT?

On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <as...@rsa.com>
wrote:

> I like the document and I think it is good but I still feel like we are
> missing an important part here.
>
>
>
> Look at SO today. There are:
>
> -           4658 unanswered questions under apache-spark tag.
>
> -          394 unanswered questions under spark-dataframe tag.
>
> -          639 unanswered questions under apache-spark-sql
>
> -          859 unanswered questions under pyspark
>
>
>
> Just moving people to ask there will not help. The whole issue is having
> people answer the questions.
>
>
>
> The problem is that many of these questions do not fit SO (but are already
> there so they are noise), are bad (i.e. unclear or hard to answer),
> orphaned etc. while some are simply harder than what people with some
> experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a much
> lower noise. I thought that we would have a low tier (current one) of
> people just not following the documentation (which would remain as noise),
> then a beginner tier where we could have people downvoting bad questions
> but in most cases the community can answer the questions because they are
> common, then a “medium” tier which would mean harder questions but that can
> still be answered by advanced users and lastly an “advanced” tier to which
> committers can actually subscribed to (and adding sub tags for subsystems
> would improve this even more).
>
>
>
> I was not aware of SO policy for meta tags (the burnination link is about
> removing tags completely so I am not sure how it applies, I believe this
> link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO (
> http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level
> ).
>
>
>
> The fact that SO did not solve this issue, does not mean we shouldn’t
> either.
>
>
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be used
> to ask questions for development of spark. There are already tags for some
> spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a
> spark-streaming tag etc.). The main issue I see and the one we can’t seem
> to get around is dividing between simple questions that the community
> should answer and hard questions which only advanced users can answer.
>
>
>
> Maybe SO isn’t the correct platform for that but even within it we can try
> to find a non meta name for spark beginner questions vs. spark advanced
> questions.
>
> Assaf.
>
>
>
>
>
> *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19798&i=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
>
>
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work with:
> https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>
>
> Hope this will help us collaborate on this stuff a little faster.
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:
>
> Just a couple of random thoughts regarding Stack Overflow...
>
>    - If we are thinking about shifting focus towards SO all attempts of
>    micromanaging should be discarded right in the beginning. Especially things
>    like meta tags, which are discouraged and "burninated" (
>    https://meta.stackoverflow.com/tags/burninate-request/info) , or
>    thread bumping. Depending on a context these won't be manageable, go
>    against community guidelines or simply obsolete.
>    - Lack of expertise is unlikely an issue. Even now there is a number
>    of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>    - Identifying, improving and promoting canonical questions and
>    answers. It means closing duplicate, suggesting edits to improve existing
>    answers, providing alternative solutions. This can be also used to identify
>    gaps in the documentation.
>    - Providing a set of clear posting guidelines to reduce effort
>    required to identify the problem (think about
>    http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>    reproducible example?)
>    - Helping users decide if question is a good fit for SO (see below).
>    API questions are great fit, debugging problems like "my cluster is slow"
>    are not.
>    - Actively cleaning (closing, deleting) off-topic and low quality
>    questions. The less junk to sieve through the better chance of good
>    questions being answered.
>    - Repurposing and actively moderating SO docs (
>    https://stackoverflow.com/documentation/apache-spark/topics). Right
>    now most of the stuff that goes there is useless, duplicated or
>    plagiarized, or border case SPAM.
>    - Encouraging community to monitor featured (
>    https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>    and active & upvoted & unanswered (
>    https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
>    - Implementing some procedure to identify questions which are likely
>    to be bugs or a material for feature requests. Personally I am quite often
>    tempted to simply send a link to dev list, but I don't think it is really
>    acceptable.
>    - Animating Spark related chat room. I tried this a couple of times
>    but to no avail. Without a certain critical mass of users it just won't
>    work.
>
>
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
>
> Sure, I'll be happy to help if I can.
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
>
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
>
> Best,
>
> Maciej
>
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
>
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
>
> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
>
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:
>
> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
> ·
> https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E
> <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>
> ·
> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E
> <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
>
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
>
> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
>
>
> As usually, Sean and Cody's contributions are very to the point.
>
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=6>> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=7>> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
>
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=8>>
>
>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
>
> >> From: Sean Owen [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=9>]
>
>
> >> Sent: Wednesday, November 02, 2016 2:07 PM
>
> >> To: Mendelson, Assaf; [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=10>
>
>
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=11>>
>
>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=19770&i=12>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Maciej Szymkiewicz
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] <http:///user/SendEmail.jtp?type=node&node=19798&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> ------------------------------
> View this message in context: RE: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>

RE: Handling questions in the mailing lists

Posted by "assaf.mendelson" <as...@rsa.com>.
I like the document and I think it is good but I still feel like we are missing an important part here.

Look at SO today. There are:

-           4658 unanswered questions under apache-spark tag.

-          394 unanswered questions under spark-dataframe tag.

-          639 unanswered questions under apache-spark-sql

-          859 unanswered questions under pyspark

Just moving people to ask there will not help. The whole issue is having people answer the questions.

The problem is that many of these questions do not fit SO (but are already there so they are noise), are bad (i.e. unclear or hard to answer), orphaned etc. while some are simply harder than what people with some experience in spark can handle and require more expertise.
The problem is that people with the relevant expertise are drowning in noise. This. Is true for the mailing list and this is true for SO.

For this reason I believe that just moving people to SO will not solve anything.

My original thought was that if we had different tags then different people could watch open questions on these tags and therefore have a much lower noise. I thought that we would have a low tier (current one) of people just not following the documentation (which would remain as noise), then a beginner tier where we could have people downvoting bad questions but in most cases the community can answer the questions because they are common, then a “medium” tier which would mean harder questions but that can still be answered by advanced users and lastly an “advanced” tier to which committers can actually subscribed to (and adding sub tags for subsystems would improve this even more).

I was not aware of SO policy for meta tags (the burnination link is about removing tags completely so I am not sure how it applies, I believe this link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more relevant).
There was actually a discussion along the lines in SO (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level).

The fact that SO did not solve this issue, does not mean we shouldn’t either.

The way I see it, some tags can easily be used even with the meta tags limitation. For example, using spark-internal-development tag can be used to ask questions for development of spark. There are already tags for some spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a spark-streaming tag etc.). The main issue I see and the one we can’t seem to get around is dividing between simple questions that the community should answer and hard questions which only advanced users can answer.

Maybe SO isn’t the correct platform for that but even within it we can try to find a non meta name for spark beginner questions vs. spark advanced questions.
Assaf.


From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19770h59@n3.nabble.com]
Sent: Tuesday, November 08, 2016 7:53 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

To help track and get the verbiage for the Spark community page and welcome email jump started, here's a working document for us to work with: https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>

Hope this will help us collaborate on this stuff a little faster.

On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:

Just a couple of random thoughts regarding Stack Overflow...

  *   If we are thinking about shifting focus towards SO all attempts of micromanaging should be discarded right in the beginning. Especially things like meta tags, which are discouraged and "burninated" (https://meta.stackoverflow.com/tags/burninate-request/info) , or thread bumping. Depending on a context these won't be manageable, go against community guidelines or simply obsolete.
  *   Lack of expertise is unlikely an issue. Even now there is a number of advanced Spark users on SO. Of course the more the merrier.

Things that can be easily improved:

  *   Identifying, improving and promoting canonical questions and answers. It means closing duplicate, suggesting edits to improve existing answers, providing alternative solutions. This can be also used to identify gaps in the documentation.
  *   Providing a set of clear posting guidelines to reduce effort required to identify the problem (think about http://stackoverflow.com/q/5963269 a.k.a How to make a great R reproducible example?)
  *   Helping users decide if question is a good fit for SO (see below). API questions are great fit, debugging problems like "my cluster is slow" are not.
  *   Actively cleaning (closing, deleting) off-topic and low quality questions. The less junk to sieve through the better chance of good questions being answered.
  *   Repurposing and actively moderating SO docs (https://stackoverflow.com/documentation/apache-spark/topics). Right now most of the stuff that goes there is useless, duplicated or plagiarized, or border case SPAM.
  *   Encouraging community to monitor featured (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured) and active & upvoted & unanswered (https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
  *   Implementing some procedure to identify questions which are likely to be bugs or a material for feature requests. Personally I am quite often tempted to simply send a link to dev list, but I don't think it is really acceptable.
  *   Animating Spark related chat room. I tried this a couple of times but to no avail. Without a certain critical mass of users it just won't work.



On 11/07/2016 07:32 AM, Reynold Xin wrote:
This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?

Sure, I'll be happy to help if I can.





On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:

Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).

Believe me, I've seen this before.
On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:

You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...

--

Best,

Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public archive).

It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:

We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:
·         https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
·         https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E<https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>

(It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)

Nick
​

On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?

On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=6>> wrote:
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=7>> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=8>>
> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=9>]
>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; [hidden email]</user/SendEmail.jtp?type=node&node=19770&i=10>
>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19770&i=11>>
>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]</user/SendEmail.jtp?type=node&node=19770&i=12>







--

Maciej Szymkiewicz

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Ricardo Almeida <ri...@actnowib.com>.
Thanks Reynold for reviewing the ASF rules.
Albeit the potential issues mentioned, I feel using StackOverflow would be
a improvement. And yes, some guidelines/instructions have the potential to
improve the questions and the "escalation" process.

On 7 November 2016 at 10:48, <Io...@nomura.com> wrote:

> My two cents (As a user/consumer)…
>
>
>
> I have been following & using Spark in financial services before version 1
> and before it migrated questions from Google Groups to apache mailing lists
> (which was a shame L ).
>
>
>
> SO:
>
> There has been some momentum lately on SO, but as questions were not
> “monitored/answered” by Spark experts, the motivation of posting a question
> was low and in turn the quality of questions as well. As most of us know,
> SO is usually the first place to look for info and can greatly reduce the
> need to turn to user/dev groups so it would be great if there was more
> attention to it.
>
>
>
> Spark mailing lists:
>
> As the consensus appears to be, questions tend to get lost if not
> picked-up within 1-2 days. Re-sending the same question feels “abusive” to
> me so would then give up. Provided that a good question takes time, putting
> effort in a question that can easily be ignored results to mailing a “bad”
> question (see what happens?) or no question at all. As you have probably
> observed, a few users will mail a question to “dev” with “…no answers in
> user list…” as they incorrectly assume that no-one can answer their
> question.
>
>
>
> JIRA:
>
> I find that “issues” are being quite aggressively closed down.  I’ve seen
> this twice (one I reported myself and found the second ticket while looking
> for a solution) and for this reason it doesn’t encourage users spending the
> time and effort to use. Personally, I also feel that there is some bias on
> what is in-scope and out-of-scope.
>
>
>
> My preference would be that SO would be the first place that someone would
> post a question. If a few “experts” are found regularly answering
> questions, eventually Spark users will start using it more and reduce
> “user” load by easily finding previous answers (or SO community marking a
> duplicates). The same “experts” can also encourage users to “escalate” to
> JIRA, dev/user groups once a question has been properly filtered which is
> quite common.
>
>
>
> PS. Personally, I would not follow any “bespoke/external” process on SO
> E.g. down-voting on SO for any other reason that being a bad question as
> per SO rules.
>
>
>
>
>
> *From:* Matei Zaharia [mailto:matei.zaharia@gmail.com]
> *Sent:* 07 November 2016 07:45
> *To:* assaf.mendelson
> *Cc:* dev@spark.apache.org
>
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> Even for the mailing list, I'd love to have a short set of instructions on
> how to submit your questions (maybe on http://spark.apache.org/
> community.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apache.org_community.html&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=MIQDl3ZflIuyNs62JLog9_vi0dD4xyo96x2w7XwGV3w&e=>
> or maybe in the welcome email when you subscribe). It would be great if
> someone added that. After all, we have such instructions for contributing
> PRs, for example.
>
>
>
> Matei
>
>
>
> On Nov 6, 2016, at 11:09 PM, assaf.mendelson <as...@rsa.com>
> wrote:
>
>
>
> There are other options as well. For example hosting an answerhub (
> www.answerhub.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.answerhub.com_&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=2oSovyR4k9m576OtymFnf4nQ4Xksk94HX543bDeEVQI&e=>)
> or other similar separate Q&A service.
>
> BTW, I believe the main issue is not how opinionated people are but who is
> answering questions.
>
> Today there are already people asking (and getting answers) on SO
> (including myself). The problem is that many people do not go to SO.
>
> The problem I see is how to “bump” up questions which are not being
> answered to someone more likely to be able to answer them. Simple questions
> can be answered by many people, many of them even newbies who ran into the
> issue themselves.
>
> The main issue is that the more complex the question, the less people
> there are who can answer it and those people’s bandwidth is already clogged
> by other questions.
>
> We could for example try to create tags on SO for “basic questions”,
> “medium”, “advanced”. Provide guidelines to ask first on basic, if not
> answered after X days then add the medium tag etc. Downvote people who
> don’t go by the process. This would mean that committers for example can
> look at advanced only tag and have a manageable number of questions they
> can help with while others can answer medium and basic.
>
>
>
> I agree that some things are not good for SO. Basically stuff which asks
> for opinion is such but most cases in the mailing list are either “how do I
> solve this bug” or “how do I do X”. Either of those two are good for SO.
>
>
>
>
>
> Assaf.
>
>
>
>
>
>
>
> *From:* rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden
> email]]
> *Sent:* Monday, November 07, 2016 8:33 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
>
>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]> wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
>
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
>
> Best,
>
> Maciej
>
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
>
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
>
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]> wrote:
>
> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
>
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]> wrote:
>
> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
> 1.      https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%
> 3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201412.mbox_-253CCAOhmDzfL2COdysV8r5hZN8f-3DNqXM-3Df-3DoY5NO2dHWJ-5FkVEoP-2BNg-40mail.gmail.com-253E&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=fILmWaylBzYeV5-XRmdm75cBbKG57kiU81cArNLLbdA&e=>
>
> 2.      https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%
> 3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201501.mbox_-253CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw-3DTKTxY-5FsYw-40mail.gmail.com-253E&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=_snNLu3ds5DSqCrMJ30_tq_qhaCPD6I72Sc25p0idmY&e=>
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
>
> ​
>
>
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]> wrote:
>
> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
>
>
> As usually, Sean and Cody's contributions are very to the point.
>
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
>
>
> On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
> >> From: Sean Owen [mailto:[hidden email]]
> >> Sent: Wednesday, November 02, 2016 2:07 PM
> >> To: Mendelson, Assaf; [hidden email]
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__sscce.org_&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=BhbHzFmq52GPq1-vnJpo1WYJSivxTYV2DTqLE2lcomU&e=>),
> not something most people are going to bother replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__nabble.com&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=DOSXyWQ25VrvEJ61e9vezaFFqQ6ERTNkf2btm8y3JEA&e=>
> .
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]rg
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.
> nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19757.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Handling-2Dquestions-2Din-2Dthe-2Dmailing-2Dlists-2Dtp19690p19757.html&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=ZX7C3jXG0WAWbui6GXRkT15WDj5s6Yb9U_uCYr0p7Ew&e=>
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email]
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_template_NamlServlet.jtp-3Fmacro-3Dmacro-5Fviewer-26id-3Dinstant-5Fhtml-2521nabble-253Aemail.naml-26base-3Dnabble.naml.namespaces.BasicNamespace-2Dnabble.view.web.template.NabbleNamespace-2Dnabble.view.web.template.NodeNamespace-26breadcrumbs-3Dnotify-5Fsubscribers-2521nabble-253Aemail.naml-2Dinstant-5Femails-2521nabble-253Aemail.naml-2Dsend-5Finstant-5Femail-2521nabble-253Aemail.naml&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=ZSjjNhkCGzaVsA9UhbpYXaKujat6vR7r6SdBFmyzWdc&e=>
>
>
> ------------------------------
>
> View this message in context: RE: Handling questions in the mailing lists
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Handling-2Dquestions-2Din-2Dthe-2Dmailing-2Dlists-2Dtp19690p19758.html&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=ElcVsDry_U8-DCD_Awaa6kExNKJo0gQ6Dpbp0HNMUUI&e=>
> Sent from the Apache Spark Developers List mailing list archive
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=dXRHr7SeGMH3ksDwIya1xGEJGhVRQU4TYLf3dun_L5k&e=>
> at Nabble.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__nabble.com&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=DOSXyWQ25VrvEJ61e9vezaFFqQ6ERTNkf2btm8y3JEA&e=>
> .
>
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>

RE: Handling questions in the mailing lists

Posted by Io...@nomura.com.
My two cents (As a user/consumer)…

I have been following & using Spark in financial services before version 1 and before it migrated questions from Google Groups to apache mailing lists (which was a shame ☹ ).

SO:
There has been some momentum lately on SO, but as questions were not “monitored/answered” by Spark experts, the motivation of posting a question was low and in turn the quality of questions as well. As most of us know, SO is usually the first place to look for info and can greatly reduce the need to turn to user/dev groups so it would be great if there was more attention to it.

Spark mailing lists:
As the consensus appears to be, questions tend to get lost if not picked-up within 1-2 days. Re-sending the same question feels “abusive” to me so would then give up. Provided that a good question takes time, putting effort in a question that can easily be ignored results to mailing a “bad” question (see what happens?) or no question at all. As you have probably observed, a few users will mail a question to “dev” with “…no answers in user list…” as they incorrectly assume that no-one can answer their question.

JIRA:
I find that “issues” are being quite aggressively closed down.  I’ve seen this twice (one I reported myself and found the second ticket while looking for a solution) and for this reason it doesn’t encourage users spending the time and effort to use. Personally, I also feel that there is some bias on what is in-scope and out-of-scope.

My preference would be that SO would be the first place that someone would post a question. If a few “experts” are found regularly answering questions, eventually Spark users will start using it more and reduce “user” load by easily finding previous answers (or SO community marking a duplicates). The same “experts” can also encourage users to “escalate” to JIRA, dev/user groups once a question has been properly filtered which is quite common.

PS. Personally, I would not follow any “bespoke/external” process on SO E.g. down-voting on SO for any other reason that being a bad question as per SO rules.


From: Matei Zaharia [mailto:matei.zaharia@gmail.com]
Sent: 07 November 2016 07:45
To: assaf.mendelson
Cc: dev@spark.apache.org
Subject: Re: Handling questions in the mailing lists

Even for the mailing list, I'd love to have a short set of instructions on how to submit your questions (maybe on http://spark.apache.org/community.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apache.org_community.html&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=MIQDl3ZflIuyNs62JLog9_vi0dD4xyo96x2w7XwGV3w&e=> or maybe in the welcome email when you subscribe). It would be great if someone added that. After all, we have such instructions for contributing PRs, for example.

Matei

On Nov 6, 2016, at 11:09 PM, assaf.mendelson <as...@rsa.com>> wrote:

There are other options as well. For example hosting an answerhub (www.answerhub.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.answerhub.com_&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=2oSovyR4k9m576OtymFnf4nQ4Xksk94HX543bDeEVQI&e=>) or other similar separate Q&A service.
BTW, I believe the main issue is not how opinionated people are but who is answering questions.
Today there are already people asking (and getting answers) on SO (including myself). The problem is that many people do not go to SO.
The problem I see is how to “bump” up questions which are not being answered to someone more likely to be able to answer them. Simple questions can be answered by many people, many of them even newbies who ran into the issue themselves.
The main issue is that the more complex the question, the less people there are who can answer it and those people’s bandwidth is already clogged by other questions.
We could for example try to create tags on SO for “basic questions”, “medium”, “advanced”. Provide guidelines to ask first on basic, if not answered after X days then add the medium tag etc. Downvote people who don’t go by the process. This would mean that committers for example can look at advanced only tag and have a manageable number of questions they can help with while others can answer medium and basic.

I agree that some things are not good for SO. Basically stuff which asks for opinion is such but most cases in the mailing list are either “how do I solve this bug” or “how do I do X”. Either of those two are good for SO.


Assaf.



From: rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19758&i=0>]
Sent: Monday, November 07, 2016 8:33 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?


On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=0>> wrote:
Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)
To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).
Believe me, I've seen this before.
On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=1>> wrote:
You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...

--

Best,

Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public archive).

It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=2>> wrote:
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=3>> wrote:
We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.
For some history, see this discussion:
1.      https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201412.mbox_-253CCAOhmDzfL2COdysV8r5hZN8f-3DNqXM-3Df-3DoY5NO2dHWJ-5FkVEoP-2BNg-40mail.gmail.com-253E&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=fILmWaylBzYeV5-XRmdm75cBbKG57kiU81cArNLLbdA&e=>
2.      https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201501.mbox_-253CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw-3DTKTxY-5FsYw-40mail.gmail.com-253E&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=_snNLu3ds5DSqCrMJ30_tq_qhaCPD6I72Sc25p0idmY&e=>
(It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)
Nick
​

On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=4>> wrote:
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?

On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=5>> wrote:
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=6>> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=7>>
> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=8>]
>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; [hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=9>
>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__sscce.org_&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=BhbHzFmq52GPq1-vnJpo1WYJSivxTYV2DTqLE2lcomU&e=>), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=10>>
>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__nabble.com&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=DOSXyWQ25VrvEJ61e9vezaFFqQ6ERTNkf2btm8y3JEA&e=>.
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=11>rg








________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19757.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Handling-2Dquestions-2Din-2Dthe-2Dmailing-2Dlists-2Dtp19690p19757.html&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=ZX7C3jXG0WAWbui6GXRkT15WDj5s6Yb9U_uCYr0p7Ew&e=>
To start a new topic under Apache Spark Developers List, email [hidden email]<x-msg://40/user/SendEmail.jtp?type=node&node=19758&i=1>
To unsubscribe from Apache Spark Developers List, click here.
NAML<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_template_NamlServlet.jtp-3Fmacro-3Dmacro-5Fviewer-26id-3Dinstant-5Fhtml-2521nabble-253Aemail.naml-26base-3Dnabble.naml.namespaces.BasicNamespace-2Dnabble.view.web.template.NabbleNamespace-2Dnabble.view.web.template.NodeNamespace-26breadcrumbs-3Dnotify-5Fsubscribers-2521nabble-253Aemail.naml-2Dinstant-5Femails-2521nabble-253Aemail.naml-2Dsend-5Finstant-5Femail-2521nabble-253Aemail.naml&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=ZSjjNhkCGzaVsA9UhbpYXaKujat6vR7r6SdBFmyzWdc&e=>

________________________________
View this message in context: RE: Handling questions in the mailing lists<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Handling-2Dquestions-2Din-2Dthe-2Dmailing-2Dlists-2Dtp19690p19758.html&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=ElcVsDry_U8-DCD_Awaa6kExNKJo0gQ6Dpbp0HNMUUI&e=>
Sent from the Apache Spark Developers List mailing list archive<https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=dXRHr7SeGMH3ksDwIya1xGEJGhVRQU4TYLf3dun_L5k&e=> at Nabble.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__nabble.com&d=DQMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho&s=DOSXyWQ25VrvEJ61e9vezaFFqQ6ERTNkf2btm8y3JEA&e=>.



This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm


Re: Handling questions in the mailing lists

Posted by Matei Zaharia <ma...@gmail.com>.
Even for the mailing list, I'd love to have a short set of instructions on how to submit your questions (maybe on http://spark.apache.org/community.html or maybe in the welcome email when you subscribe). It would be great if someone added that. After all, we have such instructions for contributing PRs, for example.

Matei

> On Nov 6, 2016, at 11:09 PM, assaf.mendelson <as...@rsa.com> wrote:
> 
> There are other options as well. For example hosting an answerhub (www.answerhub.com <http://www.answerhub.com/>) or other similar separate Q&A service.
> 
> BTW, I believe the main issue is not how opinionated people are but who is answering questions.
> 
> Today there are already people asking (and getting answers) on SO (including myself). The problem is that many people do not go to SO.
> 
> The problem I see is how to “bump” up questions which are not being answered to someone more likely to be able to answer them. Simple questions can be answered by many people, many of them even newbies who ran into the issue themselves.
> 
> The main issue is that the more complex the question, the less people there are who can answer it and those people’s bandwidth is already clogged by other questions.
> 
> We could for example try to create tags on SO for “basic questions”, “medium”, “advanced”. Provide guidelines to ask first on basic, if not answered after X days then add the medium tag etc. Downvote people who don’t go by the process. This would mean that committers for example can look at advanced only tag and have a manageable number of questions they can help with while others can answer medium and basic.
> 
>  
> 
> I agree that some things are not good for SO. Basically stuff which asks for opinion is such but most cases in the mailing list are either “how do I solve this bug” or “how do I do X”. Either of those two are good for SO.
> 
>  
> 
>  
> 
> Assaf.
> 
>  
> 
>  
> 
>  
> 
> From: rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19758&i=0>] 
> Sent: Monday, November 07, 2016 8:33 AM
> To: Mendelson, Assaf
> Subject: Re: Handling questions in the mailing lists
> 
>  
> 
> This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?
> 
>  
> 
>  
> 
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=0>> wrote:
> 
> Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)
> 
> To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).
> 
> Believe me, I've seen this before.
> 
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
> 
> You have substantially underestimated how opinionated people can be on mailing lists too :)
> 
> On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=1>> wrote:
> 
> You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...
> 
> -- 
> Best, 
> Maciej
>  
> 
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
> 
> OK I've checked on the ASF member list (which is private so there is no public archive).
> 
>  
> 
> It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.
> 
>  
> 
>  
> 
>  
> 
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=2>> wrote:
> 
> Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to. 
> 
>  
> 
> I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable. 
> 
> 
> 
> On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=3>> wrote:
> 
> We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.
> 
> For some history, see this discussion:
> 
> ·         https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
> ·         https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
> (It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)
> 
> Nick
> 
> ​
> 
>  
> 
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=4>> wrote:
> 
> I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).
> 
>  
> 
> As usually, Sean and Cody's contributions are very to the point.
> 
> I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?
> 
>  
> 
> On 2 November 2016 at 16:36, Cody Koeninger <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=5>> wrote:
> 
> So concrete things people could do
> 
> - users could tag subject lines appropriately to the component they're
> asking about
> 
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
> 
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
> 
> 
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=6>> wrote:
> > There's already reviews@ and issues@. dev@ is for project development itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@ that's
> > then totally ignored, and people will quickly learn to post to advanced@ to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=7>>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting, mailing
> >> list B is only for PR and have something like stack overflow for developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
> >> From: Sean Owen [mailto:[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=8>]
> >> Sent: Wednesday, November 02, 2016 2:07 PM
> >> To: Mendelson, Assaf; [hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=9>
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of experience.
> >> For any given person, most messages will be irrelevant. I also find that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/ <http://sscce.org/>), not something most people are going to bother replying
> >> to even if they could answer. I almost entirely ignore user@ because there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not wrong to
> >> ask questions on user@ of course, but a lot of the questions I see could
> >> have been answered with research of existing docs or looking at the code. I
> >> think that given the scale of the list, it's not wrong to assert that this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=10>>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This means
> >> that any spark internal question will almost certainly remain unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19757&i=11>rg
> 
>  
> 
>  
> 
>  
> 
> 
> 
> 
>  
> 
>  
> 
> If you reply to this email, your message will be added to the discussion below:
> 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19757.html <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19757.html>
> To start a new topic under Apache Spark Developers List, email [hidden email] <x-msg://40/user/SendEmail.jtp?type=node&node=19758&i=1> 
> To unsubscribe from Apache Spark Developers List, click here <applewebdata://565D0BC6-3106-4B28-AA01-03AA37A9758E>.
> NAML <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> View this message in context: RE: Handling questions in the mailing lists <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19758.html>
> Sent from the Apache Spark Developers List mailing list archive <http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.


Re: Handling questions in the mailing lists

Posted by Michael Segel <ms...@hotmail.com>.
Guys… please take what I say with a grain of salt…

The issue is that the input is a stream of messages where they are addressed in a LIFO manner.  This means that messages may be ignored. The stream of data (user@spark for example) is semi-structured in that the stream contains a lot of messages, some which could be noise or repeats not really organized by content.


So why not try to solve this as a Big Data problem… You’re streaming data in to the ‘lake’ and upon ingestion, you need to scan / index / and tag the message so that it could be easier to find.

Now you can create user tools to search the messages. (e.g. SparkSQL … , ML, etc…) So you can find a target set of messages and see how many times they have been viewed, answered… even query who answered them… (e.g. Dean Wampler on Spark/Scala issues answered 30 questions this past month.   or Owen was answering questions that focused on spark security… )  What features came up the most in the questions…  etc …


I guess the point I’m trying to make is that you should consider rolling your own tool set, or looking beyond just SO.

Some have taken to glitter to set up online communities where discussions and questions can be answered… but looking at tools like glitter (github) , Atlassian, and SO… its a disjoint toolset.

Why not choose one, or decide to roll your own and move on with it?  (Either under Apache, or outside on your own.)


I apologize for my mini rant.

-Mike

On Nov 7, 2016, at 4:24 PM, Maciej Szymkiewicz <ms...@gmail.com>> wrote:


Just a couple of random thoughts regarding Stack Overflow...

  *   If we are thinking about shifting focus towards SO all attempts of micromanaging should be discarded right in the beginning. Especially things like meta tags, which are discouraged and "burninated" (https://meta.stackoverflow.com/tags/burninate-request/info) , or thread bumping. Depending on a context these won't be manageable, go against community guidelines or simply obsolete.
  *   Lack of expertise is unlikely an issue. Even now there is a number of advanced Spark users on SO. Of course the more the merrier.

Things that can be easily improved:

  *   Identifying, improving and promoting canonical questions and answers. It means closing duplicate, suggesting edits to improve existing answers, providing alternative solutions. This can be also used to identify gaps in the documentation.
  *   Providing a set of clear posting guidelines to reduce effort required to identify the problem (think abouthttp://stackoverflow.com/q/5963269 a.k.a How to make a great R reproducible example?)
  *   Helping users decide if question is a good fit for SO (see below). API questions are great fit, debugging problems like "my cluster is slow" are not.
  *   Actively cleaning (closing, deleting) off-topic and low quality questions. The less junk to sieve through the better chance of good questions being answered.
  *   Repurposing and actively moderating SO docs (https://stackoverflow.com/documentation/apache-spark/topics). Right now most of the stuff that goes there is useless, duplicated or plagiarized, or border case SPAM.
  *   Encouraging community to monitor featured (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured) and active & upvoted & unanswered (https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
  *   Implementing some procedure to identify questions which are likely to be bugs or a material for feature requests. Personally I am quite often tempted to simply send a link to dev list, but I don't think it is really acceptable.
  *   Animating Spark related chat room. I tried this a couple of times but to no avail. Without a certain critical mass of users it just won't work.


On 11/07/2016 07:32 AM, Reynold Xin wrote:
This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?

Sure, I'll be happy to help if I can.



On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <ms...@gmail.com>> wrote:

Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).

Believe me, I've seen this before.

On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz <ms...@gmail.com>> wrote:

You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...

--
Best,
Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public archive).

It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rx...@databricks.com> wrote:
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <ni...@gmail.com> wrote:

We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:

  *   https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
  *   https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E

(It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)

Nick

On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <ri...@actnowib.com> wrote:
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?

On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <As...@rsa.com>
> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:sowen@cloudera.com]
>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; dev@spark.apache.org
>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <as...@rsa.com>
>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com<http://nabble.com>.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org








--
Maciej Szymkiewicz


Re: Handling questions in the mailing lists

Posted by Denny Lee <de...@gmail.com>.
Here here! :)  Completely agree with you - here's the latest updates
to Proposed
Community Mailing Lists / StackOverflow Changes
<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#>.
Keep them coming though at this point, I'd like to limit new verbiage to
prevent it from being too long hence not being read.  Modifications and
suggestions are absolutely welcome - just asking that we don't make it too
much longer.  Thanks!


On Wed, Nov 9, 2016 at 5:36 AM Gerard Maas <ge...@gmail.com> wrote:

> Great discussion. Glad to see it happening and lucky to have seen it on
> the mailing list due to its high volume.
>
> I had this same conversation with Patrick Wendell few Spark Summits ago.
> At the time, SO was not even listed as a resource and the idea was to make
> it the primary "go-to" place for questions.
>
> Having contributed to both the list (in its early days) and SO, the
> biggest hurdle IMO is how to deal with lazy people. These days, at SO, I
> spend more time leaving comments than answering in an attempt to moderate
> the requirement of "show some effort" and clarify unclear questions.
>
> It's my impression that the mailing list is much more friendly with "plz
> send me da code" folk and indeed would answer questions that would
> otherwise get down-voted or closed at SO. That also shows in the high email
> volume, which at the same time lowers its value for many of us who get
> overwhelmed. It's hard to separate authentic efforts in getting started,
> which deserve help and encouraging vs moderating "work dumpers" that abuse
> resources to get their thing done. Also, beginner questions always repeat
> and a mailing list has no features to help with that.
>
> The model I had in imagined roughly follows the "Odersky scale":
>  - Users new with the technology and basic "how to" questions belong in
> Stack Overflow. => The search and de-duplication features should help in
> getting an answer if already present, reducing the load.
>  - Advanced discussions and troubleshooting belong in users@
>  - Library bugs, new features and improvements belong in dev@
>
> Off course, there's no hard line between these levels and it would require
> contributor discretion aided with some routing procedure:
>
> - Spark documentation should establish Stack Overflow as the main go-to
> resource.
> - Contributors on the list should friendly redirect "intro level
> questions" to Stack Overflow.
> - SO contributors should redirect potential bugs and questions deserving a
> deeper discussion to @users or @dev as needed
> - @users -> @dev as today
> - Cross-posting SO + @users should be discouraged. The idea is to create
> efficient channels.
>
> A good resource on how and where to ask questions would be a great routing
> channel between the levels above.
> I'm willing to help with moderation efforts on "Spark Overflow" :-) to get
> this going.
>
> The Spark community has always been very welcoming and that spirit should
> be preserved. We just need to channel the efforts in a more efficient way.
>
> my 2c,
>
> Gerard.
>
>
> On Mon, Nov 7, 2016 at 11:24 PM, Maciej Szymkiewicz <
> mszymkiewicz@gmail.com> wrote:
>
> Just a couple of random thoughts regarding Stack Overflow...
>
>    - If we are thinking about shifting focus towards SO all attempts of
>    micromanaging should be discarded right in the beginning. Especially things
>    like meta tags, which are discouraged and "burninated" (
>    https://meta.stackoverflow.com/tags/burninate-request/info) , or
>    thread bumping. Depending on a context these won't be manageable, go
>    against community guidelines or simply obsolete.
>    - Lack of expertise is unlikely an issue. Even now there is a number
>    of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>    - Identifying, improving and promoting canonical questions and
>    answers. It means closing duplicate, suggesting edits to improve existing
>    answers, providing alternative solutions. This can be also used to identify
>    gaps in the documentation.
>    - Providing a set of clear posting guidelines to reduce effort
>    required to identify the problem (think about
>    http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>    reproducible example?)
>    - Helping users decide if question is a good fit for SO (see below).
>    API questions are great fit, debugging problems like "my cluster is slow"
>    are not.
>    - Actively cleaning (closing, deleting) off-topic and low quality
>    questions. The less junk to sieve through the better chance of good
>    questions being answered.
>    - Repurposing and actively moderating SO docs (
>    https://stackoverflow.com/documentation/apache-spark/topics). Right
>    now most of the stuff that goes there is useless, duplicated or
>    plagiarized, or border case SPAM.
>    - Encouraging community to monitor featured (
>    https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>    and active & upvoted & unanswered (
>    https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
>    - Implementing some procedure to identify questions which are likely
>    to be bugs or a material for feature requests. Personally I am quite often
>    tempted to simply send a link to dev list, but I don't think it is really
>    acceptable.
>    - Animating Spark related chat room. I tried this a couple of times
>    but to no avail. Without a certain critical mass of users it just won't
>    work.
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
> Sure, I'll be happy to help if I can.
>
>
>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <mszymkiewicz@gmail.com
> > wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <ms...@gmail.com>
> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
> Best,
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rx...@databricks.com> wrote:
>
> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
>    -
>    https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
>    -
>    https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
> ricardo.almeida@actnowib.com> wrote:
>
> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
> As usually, Sean and Cody's contributions are very to the point.
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
> Assaf.Mendelson@rsa.com>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
> >> From: Sean Owen [mailto:sowen@cloudera.com]
> >> Sent: Wednesday, November 02, 2016 2:07 PM
> >> To: Mendelson, Assaf; dev@spark.apache.org
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
> assaf.mendelson@rsa.com>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>
>
>
>
>
>
> --
> Maciej Szymkiewicz
>
>
>

Re: Handling questions in the mailing lists

Posted by Gerard Maas <ge...@gmail.com>.
Great discussion. Glad to see it happening and lucky to have seen it on the
mailing list due to its high volume.

I had this same conversation with Patrick Wendell few Spark Summits ago. At
the time, SO was not even listed as a resource and the idea was to make it
the primary "go-to" place for questions.

Having contributed to both the list (in its early days) and SO, the biggest
hurdle IMO is how to deal with lazy people. These days, at SO, I spend more
time leaving comments than answering in an attempt to moderate the
requirement of "show some effort" and clarify unclear questions.

It's my impression that the mailing list is much more friendly with "plz
send me da code" folk and indeed would answer questions that would
otherwise get down-voted or closed at SO. That also shows in the high email
volume, which at the same time lowers its value for many of us who get
overwhelmed. It's hard to separate authentic efforts in getting started,
which deserve help and encouraging vs moderating "work dumpers" that abuse
resources to get their thing done. Also, beginner questions always repeat
and a mailing list has no features to help with that.

The model I had in imagined roughly follows the "Odersky scale":
 - Users new with the technology and basic "how to" questions belong in
Stack Overflow. => The search and de-duplication features should help in
getting an answer if already present, reducing the load.
 - Advanced discussions and troubleshooting belong in users@
 - Library bugs, new features and improvements belong in dev@

Off course, there's no hard line between these levels and it would require
contributor discretion aided with some routing procedure:

- Spark documentation should establish Stack Overflow as the main go-to
resource.
- Contributors on the list should friendly redirect "intro level questions"
to Stack Overflow.
- SO contributors should redirect potential bugs and questions deserving a
deeper discussion to @users or @dev as needed
- @users -> @dev as today
- Cross-posting SO + @users should be discouraged. The idea is to create
efficient channels.

A good resource on how and where to ask questions would be a great routing
channel between the levels above.
I'm willing to help with moderation efforts on "Spark Overflow" :-) to get
this going.

The Spark community has always been very welcoming and that spirit should
be preserved. We just need to channel the efforts in a more efficient way.

my 2c,

Gerard.


On Mon, Nov 7, 2016 at 11:24 PM, Maciej Szymkiewicz <ms...@gmail.com>
wrote:

> Just a couple of random thoughts regarding Stack Overflow...
>
>    - If we are thinking about shifting focus towards SO all attempts of
>    micromanaging should be discarded right in the beginning. Especially things
>    like meta tags, which are discouraged and "burninated" (
>    https://meta.stackoverflow.com/tags/burninate-request/info
>    <https://meta.stackoverflow.com/tags/burninate-request/info>) , or
>    thread bumping. Depending on a context these won't be manageable, go
>    against community guidelines or simply obsolete.
>    - Lack of expertise is unlikely an issue. Even now there is a number
>    of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>    - Identifying, improving and promoting canonical questions and
>    answers. It means closing duplicate, suggesting edits to improve existing
>    answers, providing alternative solutions. This can be also used to identify
>    gaps in the documentation.
>    - Providing a set of clear posting guidelines to reduce effort
>    required to identify the problem (think about
>    http://stackoverflow.com/q/5963269 <http://stackoverflow.com/q/5963269>
>    a.k.a How to make a great R reproducible example?)
>    - Helping users decide if question is a good fit for SO (see below).
>    API questions are great fit, debugging problems like "my cluster is slow"
>    are not.
>    - Actively cleaning (closing, deleting) off-topic and low quality
>    questions. The less junk to sieve through the better chance of good
>    questions being answered.
>    - Repurposing and actively moderating SO docs (
>    https://stackoverflow.com/documentation/apache-spark/topics
>    <https://stackoverflow.com/documentation/apache-spark/topics>). Right
>    now most of the stuff that goes there is useless, duplicated or
>    plagiarized, or border case SPAM.
>    - Encouraging community to monitor featured (https://stackoverflow.com/
>    questions/tagged/apache-spark?sort=featured
>    <https://stackoverflow.com/questions/tagged/apache-spark?sort=featured>)
>    and active & upvoted & unanswered (https://stackoverflow.com/
>    unanswered/tagged/apache-spark) questions.
>    - Implementing some procedure to identify questions which are likely
>    to be bugs or a material for feature requests. Personally I am quite often
>    tempted to simply send a link to dev list, but I don't think it is really
>    acceptable.
>    - Animating Spark related chat room. I tried this a couple of times
>    but to no avail. Without a certain critical mass of users it just won't
>    work.
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
> Sure, I'll be happy to help if I can.
>
>
>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <mszymkiewicz@gmail.com
> > wrote:
>
>> Damn, I always thought that mailing list is only for nice and welcoming
>> people and there is nothing to do for me here >:)
>>
>> To be serious though, there are many questions on the users list which
>> would fit just fine on SO but it is not true in general. There are dozens
>> of questions which are to broad, opinion based, ask for external resources
>> and so on. If you want to direct users to SO you have to help them to
>> decide if it is the right channel. Otherwise it will just create a really
>> bad experience for both seeking help and active answerers. Former ones will
>> be downvoted and bashed, latter ones will have to deal with handling all
>> the junk and the number of active Spark users with moderation privileges is
>> really low (with only Massg and me being able to directly close duplicates).
>>
>> Believe me, I've seen this before.
>> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>>
>> You have substantially underestimated how opinionated people can be on
>> mailing lists too :)
>>
>> On Sunday, November 6, 2016, Maciej Szymkiewicz <ms...@gmail.com>
>> wrote:
>>
>>> You have to remember that Stack Overflow crowd (like me) is highly
>>> opinionated, so many questions, which could be just fine on the mailing
>>> list, will be quickly downvoted and / or closed as off-topic. Just
>>> saying...
>>>
>>> --
>>> Best,
>>> Maciej
>>>
>>>
>>> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>>>
>>> OK I've checked on the ASF member list (which is private so there is no
>>> public archive).
>>>
>>> It is not against any ASF rule to recommend StackOverflow as a place for
>>> users to ask questions. I don't think we can or should delete the existing
>>> user@spark list either, but we can certainly make SO more visible than
>>> it is.
>>>
>>>
>>>
>>> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>>
>>>> Actually after talking with more ASF members, I believe the only policy
>>>> is that development decisions have to be made and announced on ASF
>>>> properties (dev list or jira), but user questions don't have to.
>>>>
>>>> I'm going to double check this. If it is true, I would actually
>>>> recommend us moving entirely over the Q&A part of the user list to
>>>> stackoverflow, or at least make that the recommended way rather than the
>>>> existing user list which is not very scalable.
>>>>
>>>>
>>>> On Wednesday, November 2, 2016, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> We’ve discussed several times upgrading our communication tools, as
>>>>> far back as 2014 and maybe even before that too. The bottom line is that we
>>>>> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>>>>>
>>>>> For some history, see this discussion:
>>>>>
>>>>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201412.
>>>>>    mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
>>>>>    mail.gmail.com%3E
>>>>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>>>>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201501.
>>>>>    mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
>>>>>    mail.gmail.com%3E
>>>>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>>>>>
>>>>> (It’s ironic that it’s difficult to follow the past discussion on why
>>>>> we can’t change our official communication tools due to those very tools…)
>>>>>
>>>>> Nick
>>>>> ​
>>>>>
>>>>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
>>>>> ricardo.almeida@actnowib.com> wrote:
>>>>>
>>>>>> I fell Assaf point is quite relevant if we want to move this project
>>>>>> forward from the Spark user perspective (as I do). In fact, we're
>>>>>> still using 20th century tools (mailing lists) with some add-ons (like
>>>>>> Stack Overflow).
>>>>>>
>>>>>> As usually, Sean and Cody's contributions are very to the point.
>>>>>> I fell it is indeed a matter of of culture (hard to enforce) and tools
>>>>>> (much easier). Isn't it?
>>>>>>
>>>>>> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org>
>>>>>> wrote:
>>>>>>
>>>>>>> So concrete things people could do
>>>>>>>
>>>>>>> - users could tag subject lines appropriately to the component
>>>>>>> they're
>>>>>>> asking about
>>>>>>>
>>>>>>> - contributors could monitor user@ for tags relating to components
>>>>>>> they've worked on.
>>>>>>> I'd be surprised if my miss rate for any mailing list questions
>>>>>>> well-labeled as Kafka was higher than 5%
>>>>>>>
>>>>>>> - committers could be more aggressive about soliciting and merging
>>>>>>> PRs
>>>>>>> to improve documentation.
>>>>>>> It's a lot easier to answer even poorly-asked questions with a link
>>>>>>> to
>>>>>>> relevant docs.
>>>>>>>
>>>>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com>
>>>>>>> wrote:
>>>>>>> > There's already reviews@ and issues@. dev@ is for project
>>>>>>> development itself
>>>>>>> > and I think is OK. You're suggesting splitting up user@ and I
>>>>>>> sympathize
>>>>>>> > with the motivation. Experience tells me that we'll have a
>>>>>>> beginner@ that's
>>>>>>> > then totally ignored, and people will quickly learn to post to
>>>>>>> advanced@ to
>>>>>>> > get attention, and we'll be back where we started. Putting it in
>>>>>>> JIRA
>>>>>>> > doesn't help. I don't think this a problem that is merely down to
>>>>>>> lack of
>>>>>>> > process. It actually requires cultivating a culture change on the
>>>>>>> community
>>>>>>> > list.
>>>>>>> >
>>>>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
>>>>>>> Assaf.Mendelson@rsa.com>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> What I am suggesting is basically to fix that.
>>>>>>> >>
>>>>>>> >> For example, we might say that mailing list A is only for voting,
>>>>>>> mailing
>>>>>>> >> list B is only for PR and have something like stack overflow for
>>>>>>> developer
>>>>>>> >> questions (I would even go as far as to have beginner,
>>>>>>> intermediate and
>>>>>>> >> advanced mailing list for users and beginner/advanced for dev).
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> This can easily be done using stack overflow tags, however, that
>>>>>>> would
>>>>>>> >> probably be harder to manage.
>>>>>>> >>
>>>>>>> >> Maybe using special jira tags and manage it in jira?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Anyway as I said, the main issue is not user questions (except
>>>>>>> maybe
>>>>>>> >> advanced ones) but more for dev questions. It is so easy to get
>>>>>>> lost in the
>>>>>>> >> chatter that it makes it very hard for people to learn spark
>>>>>>> internals…
>>>>>>> >>
>>>>>>> >> Assaf.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> From: Sean Owen [mailto:sowen@cloudera.com]
>>>>>>> >> Sent: Wednesday, November 02, 2016 2:07 PM
>>>>>>> >> To: Mendelson, Assaf; dev@spark.apache.org
>>>>>>> >> Subject: Re: Handling questions in the mailing lists
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I think that unfortunately mailing lists don't scale well. This
>>>>>>> one has
>>>>>>> >> thousands of subscribers with different interests and levels of
>>>>>>> experience.
>>>>>>> >> For any given person, most messages will be irrelevant. I also
>>>>>>> find that a
>>>>>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE
>>>>>>> >> (http://sscce.org/), not something most people are going to
>>>>>>> bother replying
>>>>>>> >> to even if they could answer. I almost entirely ignore user@
>>>>>>> because there
>>>>>>> >> are higher-priority channels like PRs to deal with, that already
>>>>>>> have
>>>>>>> >> hundreds of messages per day. This is why little of it gets an
>>>>>>> answer -- too
>>>>>>> >> noisy.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> We have to have official mailing lists, in any event, to have some
>>>>>>> >> official channel for things like votes and announcements. It's
>>>>>>> not wrong to
>>>>>>> >> ask questions on user@ of course, but a lot of the questions I
>>>>>>> see could
>>>>>>> >> have been answered with research of existing docs or looking at
>>>>>>> the code. I
>>>>>>> >> think that given the scale of the list, it's not wrong to assert
>>>>>>> that this
>>>>>>> >> is sort of a prerequisite for asking thousands of people to
>>>>>>> answer one's
>>>>>>> >> question. But we can't enforce that.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> The situation will get better to the extent people ask better
>>>>>>> questions,
>>>>>>> >> help other people ask better questions, and answer good
>>>>>>> questions. I'd
>>>>>>> >> encourage anyone feeling this way to try to help along those
>>>>>>> dimensions.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
>>>>>>> assaf.mendelson@rsa.com>
>>>>>>> >> wrote:
>>>>>>> >>
>>>>>>> >> Hi,
>>>>>>> >>
>>>>>>> >> I know this is a little off topic but I wanted to raise an issue
>>>>>>> about
>>>>>>> >> handling questions in the mailing list (this is true both for the
>>>>>>> user
>>>>>>> >> mailing list and the dev but since there are other options such
>>>>>>> as stack
>>>>>>> >> overflow for user questions, this is more problematic in dev).
>>>>>>> >>
>>>>>>> >> Let’s say I ask a question (as I recently did). Unfortunately
>>>>>>> this was
>>>>>>> >> during spark summit in Europe so probably people were busy. In
>>>>>>> any case no
>>>>>>> >> one answered.
>>>>>>> >>
>>>>>>> >> The problem is, that if no one answers very soon, the question
>>>>>>> will almost
>>>>>>> >> certainly remain unanswered because new messages will simply
>>>>>>> drown it.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> This is a common issue not just for questions but for any comment
>>>>>>> or idea
>>>>>>> >> which is not immediately picked up.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I believe we should have a method of handling this.
>>>>>>> >>
>>>>>>> >> Generally, I would say these types of things belong in stack
>>>>>>> overflow,
>>>>>>> >> after all, the way it is built is perfect for this. More seasoned
>>>>>>> spark
>>>>>>> >> contributors and committers can periodically check out unanswered
>>>>>>> questions
>>>>>>> >> and answer them.
>>>>>>> >>
>>>>>>> >> The problem is that stack overflow (as well as other targets such
>>>>>>> as the
>>>>>>> >> databricks forums) tend to have a more user based orientation.
>>>>>>> This means
>>>>>>> >> that any spark internal question will almost certainly remain
>>>>>>> unanswered.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I was wondering if we could come up with a solution for this.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Assaf.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> ________________________________
>>>>>>> >>
>>>>>>> >> View this message in context: Handling questions in the mailing
>>>>>>> lists
>>>>>>> >> Sent from the Apache Spark Developers List mailing list archive at
>>>>>>> >> Nabble.com.
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>>
>>
>
> --
> Maciej Szymkiewicz
>
>

Re: Handling questions in the mailing lists

Posted by Denny Lee <de...@gmail.com>.
To help track and get the verbiage for the Spark community page and welcome
email jump started, here's a working document for us to work with:
https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#

Hope this will help us collaborate on this stuff a little faster.

On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <ms...@gmail.com>
wrote:

> Just a couple of random thoughts regarding Stack Overflow...
>
>    - If we are thinking about shifting focus towards SO all attempts of
>    micromanaging should be discarded right in the beginning. Especially things
>    like meta tags, which are discouraged and "burninated" (
>    https://meta.stackoverflow.com/tags/burninate-request/info) , or
>    thread bumping. Depending on a context these won't be manageable, go
>    against community guidelines or simply obsolete.
>    - Lack of expertise is unlikely an issue. Even now there is a number
>    of advanced Spark users on SO. Of course the more the merrier.
>
> Things that can be easily improved:
>
>    - Identifying, improving and promoting canonical questions and
>    answers. It means closing duplicate, suggesting edits to improve existing
>    answers, providing alternative solutions. This can be also used to identify
>    gaps in the documentation.
>    - Providing a set of clear posting guidelines to reduce effort
>    required to identify the problem (think about
>    http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>    reproducible example?)
>    - Helping users decide if question is a good fit for SO (see below).
>    API questions are great fit, debugging problems like "my cluster is slow"
>    are not.
>    - Actively cleaning (closing, deleting) off-topic and low quality
>    questions. The less junk to sieve through the better chance of good
>    questions being answered.
>    - Repurposing and actively moderating SO docs (
>    https://stackoverflow.com/documentation/apache-spark/topics). Right
>    now most of the stuff that goes there is useless, duplicated or
>    plagiarized, or border case SPAM.
>    - Encouraging community to monitor featured (
>    https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>    and active & upvoted & unanswered (
>    https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
>    - Implementing some procedure to identify questions which are likely
>    to be bugs or a material for feature requests. Personally I am quite often
>    tempted to simply send a link to dev list, but I don't think it is really
>    acceptable.
>    - Animating Spark related chat room. I tried this a couple of times
>    but to no avail. Without a certain critical mass of users it just won't
>    work.
>
>
>
> On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
> This is an excellent point. If we do go ahead and feature SO as a way for
> users to ask questions more prominently, as someone who knows SO very well,
> would you be willing to help write a short guideline (ideally the shorter
> the better, which makes it hard) to direct what goes to user@ and what
> goes to SO?
>
>
> Sure, I'll be happy to help if I can.
>
>
>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <mszymkiewicz@gmail.com
> > wrote:
>
> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <ms...@gmail.com>
> wrote:
>
> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
> Best,
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rx...@databricks.com> wrote:
>
> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
>    -
>    https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
>    -
>    https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
> ricardo.almeida@actnowib.com> wrote:
>
> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
> As usually, Sean and Cody's contributions are very to the point.
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
> Assaf.Mendelson@rsa.com>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
> >> From: Sean Owen [mailto:sowen@cloudera.com]
> >> Sent: Wednesday, November 02, 2016 2:07 PM
> >> To: Mendelson, Assaf; dev@spark.apache.org
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
> assaf.mendelson@rsa.com>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>
>
>
>
>
>
> --
> Maciej Szymkiewicz
>
>

Re: Handling questions in the mailing lists

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
Just a couple of random thoughts regarding Stack Overflow...

  * If we are thinking about shifting focus towards SO all attempts of
    micromanaging should be discarded right in the beginning. Especially
    things like meta tags, which are discouraged and "burninated"
    (https://meta.stackoverflow.com/tags/burninate-request/info) , or
    thread bumping. Depending on a context these won't be manageable, go
    against community guidelines or simply obsolete. 
  * Lack of expertise is unlikely an issue. Even now there is a number
    of advanced Spark users on SO. Of course the more the merrier.

Things that can be easily improved:

  * Identifying, improving and promoting canonical questions and
    answers. It means closing duplicate, suggesting edits to improve
    existing answers, providing alternative solutions. This can be also
    used to identify gaps in the documentation.
  * Providing a set of clear posting guidelines to reduce effort
    required to identify the problem (think about
    http://stackoverflow.com/q/5963269 a.k.a How to make a great R
    reproducible example?)
  * Helping users decide if question is a good fit for SO (see below).
    API questions are great fit, debugging problems like "my cluster is
    slow" are not.
  * Actively cleaning (closing, deleting) off-topic and low quality
    questions. The less junk to sieve through the better chance of good
    questions being answered.
  * Repurposing and actively moderating SO docs
    (https://stackoverflow.com/documentation/apache-spark/topics). Right
    now most of the stuff that goes there is useless, duplicated or
    plagiarized, or border case SPAM.
  * Encouraging community to monitor featured
    (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
    and active & upvoted & unanswered
    (https://stackoverflow.com/unanswered/tagged/apache-spark) questions.
  * Implementing some procedure to identify questions which are likely
    to be bugs or a material for feature requests. Personally I am quite
    often tempted to simply send a link to dev list, but I don't think
    it is really acceptable.
  * Animating Spark related chat room. I tried this a couple of times
    but to no avail. Without a certain critical mass of users it just
    won't work.



On 11/07/2016 07:32 AM, Reynold Xin wrote:
> This is an excellent point. If we do go ahead and feature SO as a way
> for users to ask questions more prominently, as someone who knows SO
> very well, would you be willing to help write a short guideline
> (ideally the shorter the better, which makes it hard) to direct what
> goes to user@ and what goes to SO?

Sure, I'll be happy to help if I can.

>
>
> On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz
> <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>
>     Damn, I always thought that mailing list is only for nice and
>     welcoming people and there is nothing to do for me here >:)
>
>     To be serious though, there are many questions on the users list
>     which would fit just fine on SO but it is not true in general.
>     There are dozens of questions which are to broad, opinion based,
>     ask for external resources and so on. If you want to direct users
>     to SO you have to help them to decide if it is the right channel.
>     Otherwise it will just create a really bad experience for both
>     seeking help and active answerers. Former ones will be downvoted
>     and bashed, latter ones will have to deal with handling all the
>     junk and the number of active Spark users with moderation
>     privileges is really low (with only Massg and me being able to
>     directly close duplicates).
>
>     Believe me, I've seen this before.
>
>     On 11/07/2016 05:08 AM, Reynold Xin wrote:
>>     You have substantially underestimated how opinionated people can
>>     be on mailing lists too :)
>>
>>     On Sunday, November 6, 2016, Maciej Szymkiewicz
>>     <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>>
>>         You have to remember that Stack Overflow crowd (like me) is
>>         highly opinionated, so many questions, which could be just
>>         fine on the mailing list, will be quickly downvoted and / or
>>         closed as off-topic. Just saying...
>>
>>         -- 
>>         Best, 
>>         Maciej
>>
>>
>>         On 11/07/2016 04:03 AM, Reynold Xin wrote:
>>>         OK I've checked on the ASF member list (which is private so
>>>         there is no public archive).
>>>
>>>         It is not against any ASF rule to recommend StackOverflow as
>>>         a place for users to ask questions. I don't think we can or
>>>         should delete the existing user@spark list either, but we
>>>         can certainly make SO more visible than it is.
>>>
>>>
>>>
>>>         On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin
>>>         <rx...@databricks.com> wrote:
>>>
>>>             Actually after talking with more ASF members, I believe
>>>             the only policy is that development decisions have to be
>>>             made and announced on ASF properties (dev list or jira),
>>>             but user questions don't have to. 
>>>
>>>             I'm going to double check this. If it is true, I would
>>>             actually recommend us moving entirely over the Q&A part
>>>             of the user list to stackoverflow, or at least make that
>>>             the recommended way rather than the existing user list
>>>             which is not very scalable. 
>>>
>>>
>>>             On Wednesday, November 2, 2016, Nicholas Chammas
>>>             <ni...@gmail.com> wrote:
>>>
>>>                 We\u2019ve discussed several times upgrading our
>>>                 communication tools, as far back as 2014 and maybe
>>>                 even before that too. The bottom line is that we
>>>                 can\u2019t due to ASF rules requiring the use of
>>>                 ASF-managed mailing lists.
>>>
>>>                 For some history, see this discussion:
>>>
>>>                   * https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
>>>                     <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>>>                   * https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E
>>>                     <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>>>
>>>                 (It\u2019s ironic that it\u2019s difficult to follow the past
>>>                 discussion on why we can\u2019t change our official
>>>                 communication tools due to those very tools\u2026)
>>>
>>>                 Nick
>>>
>>>                 \u200b
>>>
>>>                 On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida
>>>                 <ri...@actnowib.com> wrote:
>>>
>>>                     I fell Assaf point is quite relevant if we want
>>>                     to move this project forward from the Spark user
>>>                     perspective (as I do). In fact, we're still
>>>                     using 20th century tools (mailing lists) with
>>>                     some add-ons (like Stack Overflow).
>>>
>>>                     As usually, Sean and Cody's contributions are
>>>                     very to the point.
>>>                     I fell it is indeed a matter of of culture (hard
>>>                     to enforce) and tools (much easier). Isn't it?
>>>
>>>                     On 2 November 2016 at 16:36, Cody Koeninger
>>>                     <co...@koeninger.org> wrote:
>>>
>>>                         So concrete things people could do
>>>
>>>                         - users could tag subject lines
>>>                         appropriately to the component they're
>>>                         asking about
>>>
>>>                         - contributors could monitor user@ for tags
>>>                         relating to components
>>>                         they've worked on.
>>>                         I'd be surprised if my miss rate for any
>>>                         mailing list questions
>>>                         well-labeled as Kafka was higher than 5%
>>>
>>>                         - committers could be more aggressive about
>>>                         soliciting and merging PRs
>>>                         to improve documentation.
>>>                         It's a lot easier to answer even
>>>                         poorly-asked questions with a link to
>>>                         relevant docs.
>>>
>>>                         On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen
>>>                         <so...@cloudera.com> wrote:
>>>                         > There's already reviews@ and issues@. dev@
>>>                         is for project development itself
>>>                         > and I think is OK. You're suggesting
>>>                         splitting up user@ and I sympathize
>>>                         > with the motivation. Experience tells me
>>>                         that we'll have a beginner@ that's
>>>                         > then totally ignored, and people will
>>>                         quickly learn to post to advanced@ to
>>>                         > get attention, and we'll be back where we
>>>                         started. Putting it in JIRA
>>>                         > doesn't help. I don't think this a problem
>>>                         that is merely down to lack of
>>>                         > process. It actually requires cultivating
>>>                         a culture change on the community
>>>                         > list.
>>>                         >
>>>                         > On Wed, Nov 2, 2016 at 12:11 PM Mendelson,
>>>                         Assaf <As...@rsa.com>
>>>                         > wrote:
>>>                         >>
>>>                         >> What I am suggesting is basically to fix
>>>                         that.
>>>                         >>
>>>                         >> For example, we might say that mailing
>>>                         list A is only for voting, mailing
>>>                         >> list B is only for PR and have something
>>>                         like stack overflow for developer
>>>                         >> questions (I would even go as far as to
>>>                         have beginner, intermediate and
>>>                         >> advanced mailing list for users and
>>>                         beginner/advanced for dev).
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> This can easily be done using stack
>>>                         overflow tags, however, that would
>>>                         >> probably be harder to manage.
>>>                         >>
>>>                         >> Maybe using special jira tags and manage
>>>                         it in jira?
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> Anyway as I said, the main issue is not
>>>                         user questions (except maybe
>>>                         >> advanced ones) but more for dev
>>>                         questions. It is so easy to get lost in the
>>>                         >> chatter that it makes it very hard for
>>>                         people to learn spark internals\u2026
>>>                         >>
>>>                         >> Assaf.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> From: Sean Owen [mailto:sowen@cloudera.com]
>>>                         >> Sent: Wednesday, November 02, 2016 2:07 PM
>>>                         >> To: Mendelson, Assaf; dev@spark.apache.org
>>>                         >> Subject: Re: Handling questions in the
>>>                         mailing lists
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> I think that unfortunately mailing lists
>>>                         don't scale well. This one has
>>>                         >> thousands of subscribers with different
>>>                         interests and levels of experience.
>>>                         >> For any given person, most messages will
>>>                         be irrelevant. I also find that a
>>>                         >> lot of questions on user@ are not
>>>                         well-asked, aren't an SSCCE
>>>                         >> (http://sscce.org/), not something most
>>>                         people are going to bother replying
>>>                         >> to even if they could answer. I almost
>>>                         entirely ignore user@ because there
>>>                         >> are higher-priority channels like PRs to
>>>                         deal with, that already have
>>>                         >> hundreds of messages per day. This is why
>>>                         little of it gets an answer -- too
>>>                         >> noisy.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> We have to have official mailing lists,
>>>                         in any event, to have some
>>>                         >> official channel for things like votes
>>>                         and announcements. It's not wrong to
>>>                         >> ask questions on user@ of course, but a
>>>                         lot of the questions I see could
>>>                         >> have been answered with research of
>>>                         existing docs or looking at the code. I
>>>                         >> think that given the scale of the list,
>>>                         it's not wrong to assert that this
>>>                         >> is sort of a prerequisite for asking
>>>                         thousands of people to answer one's
>>>                         >> question. But we can't enforce that.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> The situation will get better to the
>>>                         extent people ask better questions,
>>>                         >> help other people ask better questions,
>>>                         and answer good questions. I'd
>>>                         >> encourage anyone feeling this way to try
>>>                         to help along those dimensions.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> On Wed, Nov 2, 2016 at 11:32 AM
>>>                         assaf.mendelson <as...@rsa.com>
>>>                         >> wrote:
>>>                         >>
>>>                         >> Hi,
>>>                         >>
>>>                         >> I know this is a little off topic but I
>>>                         wanted to raise an issue about
>>>                         >> handling questions in the mailing list
>>>                         (this is true both for the user
>>>                         >> mailing list and the dev but since there
>>>                         are other options such as stack
>>>                         >> overflow for user questions, this is more
>>>                         problematic in dev).
>>>                         >>
>>>                         >> Let\u2019s say I ask a question (as I recently
>>>                         did). Unfortunately this was
>>>                         >> during spark summit in Europe so probably
>>>                         people were busy. In any case no
>>>                         >> one answered.
>>>                         >>
>>>                         >> The problem is, that if no one answers
>>>                         very soon, the question will almost
>>>                         >> certainly remain unanswered because new
>>>                         messages will simply drown it.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> This is a common issue not just for
>>>                         questions but for any comment or idea
>>>                         >> which is not immediately picked up.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> I believe we should have a method of
>>>                         handling this.
>>>                         >>
>>>                         >> Generally, I would say these types of
>>>                         things belong in stack overflow,
>>>                         >> after all, the way it is built is perfect
>>>                         for this. More seasoned spark
>>>                         >> contributors and committers can
>>>                         periodically check out unanswered questions
>>>                         >> and answer them.
>>>                         >>
>>>                         >> The problem is that stack overflow (as
>>>                         well as other targets such as the
>>>                         >> databricks forums) tend to have a more
>>>                         user based orientation. This means
>>>                         >> that any spark internal question will
>>>                         almost certainly remain unanswered.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> I was wondering if we could come up with
>>>                         a solution for this.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> Assaf.
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> ________________________________
>>>                         >>
>>>                         >> View this message in context: Handling
>>>                         questions in the mailing lists
>>>                         >> Sent from the Apache Spark Developers
>>>                         List mailing list archive at
>>>                         >> Nabble.com.
>>>
>>>                         ---------------------------------------------------------------------
>>>                         To unsubscribe e-mail:
>>>                         dev-unsubscribe@spark.apache.org
>>>
>>>
>>>
>>
>
>

-- 
Maciej Szymkiewicz


RE: Handling questions in the mailing lists

Posted by "assaf.mendelson" <as...@rsa.com>.
There are other options as well. For example hosting an answerhub (www.answerhub.com<http://www.answerhub.com>) or other similar separate Q&A service.
BTW, I believe the main issue is not how opinionated people are but who is answering questions.
Today there are already people asking (and getting answers) on SO (including myself). The problem is that many people do not go to SO.
The problem I see is how to “bump” up questions which are not being answered to someone more likely to be able to answer them. Simple questions can be answered by many people, many of them even newbies who ran into the issue themselves.
The main issue is that the more complex the question, the less people there are who can answer it and those people’s bandwidth is already clogged by other questions.
We could for example try to create tags on SO for “basic questions”, “medium”, “advanced”. Provide guidelines to ask first on basic, if not answered after X days then add the medium tag etc. Downvote people who don’t go by the process. This would mean that committers for example can look at advanced only tag and have a manageable number of questions they can help with while others can answer medium and basic.

I agree that some things are not good for SO. Basically stuff which asks for opinion is such but most cases in the mailing list are either “how do I solve this bug” or “how do I do X”. Either of those two are good for SO.


Assaf.



From: rxin [via Apache Spark Developers List] [mailto:ml-node+s1001551n19757h56@n3.nabble.com]
Sent: Monday, November 07, 2016 8:33 AM
To: Mendelson, Assaf
Subject: Re: Handling questions in the mailing lists

This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who knows SO very well, would you be willing to help write a short guideline (ideally the shorter the better, which makes it hard) to direct what goes to user@ and what goes to SO?


On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=0>> wrote:

Damn, I always thought that mailing list is only for nice and welcoming people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which would fit just fine on SO but it is not true in general. There are dozens of questions which are to broad, opinion based, ask for external resources and so on. If you want to direct users to SO you have to help them to decide if it is the right channel. Otherwise it will just create a really bad experience for both seeking help and active answerers. Former ones will be downvoted and bashed, latter ones will have to deal with handling all the junk and the number of active Spark users with moderation privileges is really low (with only Massg and me being able to directly close duplicates).

Believe me, I've seen this before.
On 11/07/2016 05:08 AM, Reynold Xin wrote:
You have substantially underestimated how opinionated people can be on mailing lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=1>> wrote:

You have to remember that Stack Overflow crowd (like me) is highly opinionated, so many questions, which could be just fine on the mailing list, will be quickly downvoted and / or closed as off-topic. Just saying...

--

Best,

Maciej

On 11/07/2016 04:03 AM, Reynold Xin wrote:
OK I've checked on the ASF member list (which is private so there is no public archive).

It is not against any ASF rule to recommend StackOverflow as a place for users to ask questions. I don't think we can or should delete the existing user@spark list either, but we can certainly make SO more visible than it is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=2>> wrote:
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q&A part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable.


On Wednesday, November 2, 2016, Nicholas Chammas <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=3>> wrote:

We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:
·         https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
·         https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E

(It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…)

Nick
​

On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=4>> wrote:
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it?

On 2 November 2016 at 16:36, Cody Koeninger <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=5>> wrote:
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=6>> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=7>>
> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=8>]
>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; [hidden email]</user/SendEmail.jtp?type=node&node=19757&i=9>
>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19757&i=10>>
>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]</user/SendEmail.jtp?type=node&node=19757&i=11>rg







________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19757.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19758.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Reynold Xin <rx...@databricks.com>.
This is an excellent point. If we do go ahead and feature SO as a way for
users to ask questions more prominently, as someone who knows SO very well,
would you be willing to help write a short guideline (ideally the shorter
the better, which makes it hard) to direct what goes to user@ and what goes
to SO?


On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <ms...@gmail.com>
wrote:

> Damn, I always thought that mailing list is only for nice and welcoming
> people and there is nothing to do for me here >:)
>
> To be serious though, there are many questions on the users list which
> would fit just fine on SO but it is not true in general. There are dozens
> of questions which are to broad, opinion based, ask for external resources
> and so on. If you want to direct users to SO you have to help them to
> decide if it is the right channel. Otherwise it will just create a really
> bad experience for both seeking help and active answerers. Former ones will
> be downvoted and bashed, latter ones will have to deal with handling all
> the junk and the number of active Spark users with moderation privileges is
> really low (with only Massg and me being able to directly close duplicates).
>
> Believe me, I've seen this before.
> On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz <ms...@gmail.com>
> wrote:
>
>> You have to remember that Stack Overflow crowd (like me) is highly
>> opinionated, so many questions, which could be just fine on the mailing
>> list, will be quickly downvoted and / or closed as off-topic. Just
>> saying...
>>
>> --
>> Best,
>> Maciej
>>
>>
>> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>>
>> OK I've checked on the ASF member list (which is private so there is no
>> public archive).
>>
>> It is not against any ASF rule to recommend StackOverflow as a place for
>> users to ask questions. I don't think we can or should delete the existing
>> user@spark list either, but we can certainly make SO more visible than
>> it is.
>>
>>
>>
>> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Actually after talking with more ASF members, I believe the only policy
>>> is that development decisions have to be made and announced on ASF
>>> properties (dev list or jira), but user questions don't have to.
>>>
>>> I'm going to double check this. If it is true, I would actually
>>> recommend us moving entirely over the Q&A part of the user list to
>>> stackoverflow, or at least make that the recommended way rather than the
>>> existing user list which is not very scalable.
>>>
>>>
>>> On Wednesday, November 2, 2016, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> We’ve discussed several times upgrading our communication tools, as far
>>>> back as 2014 and maybe even before that too. The bottom line is that we
>>>> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>>>>
>>>> For some history, see this discussion:
>>>>
>>>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201412.
>>>>    mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
>>>>    mail.gmail.com%3E
>>>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>>>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201501.
>>>>    mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
>>>>    mail.gmail.com%3E
>>>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>>>>
>>>> (It’s ironic that it’s difficult to follow the past discussion on why
>>>> we can’t change our official communication tools due to those very tools…)
>>>>
>>>> Nick
>>>> ​
>>>>
>>>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
>>>> ricardo.almeida@actnowib.com> wrote:
>>>>
>>>>> I fell Assaf point is quite relevant if we want to move this project
>>>>> forward from the Spark user perspective (as I do). In fact, we're
>>>>> still using 20th century tools (mailing lists) with some add-ons (like
>>>>> Stack Overflow).
>>>>>
>>>>> As usually, Sean and Cody's contributions are very to the point.
>>>>> I fell it is indeed a matter of of culture (hard to enforce) and tools
>>>>> (much easier). Isn't it?
>>>>>
>>>>> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org>
>>>>> wrote:
>>>>>
>>>>>> So concrete things people could do
>>>>>>
>>>>>> - users could tag subject lines appropriately to the component they're
>>>>>> asking about
>>>>>>
>>>>>> - contributors could monitor user@ for tags relating to components
>>>>>> they've worked on.
>>>>>> I'd be surprised if my miss rate for any mailing list questions
>>>>>> well-labeled as Kafka was higher than 5%
>>>>>>
>>>>>> - committers could be more aggressive about soliciting and merging PRs
>>>>>> to improve documentation.
>>>>>> It's a lot easier to answer even poorly-asked questions with a link to
>>>>>> relevant docs.
>>>>>>
>>>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>>> > There's already reviews@ and issues@. dev@ is for project
>>>>>> development itself
>>>>>> > and I think is OK. You're suggesting splitting up user@ and I
>>>>>> sympathize
>>>>>> > with the motivation. Experience tells me that we'll have a beginner@
>>>>>> that's
>>>>>> > then totally ignored, and people will quickly learn to post to
>>>>>> advanced@ to
>>>>>> > get attention, and we'll be back where we started. Putting it in
>>>>>> JIRA
>>>>>> > doesn't help. I don't think this a problem that is merely down to
>>>>>> lack of
>>>>>> > process. It actually requires cultivating a culture change on the
>>>>>> community
>>>>>> > list.
>>>>>> >
>>>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
>>>>>> Assaf.Mendelson@rsa.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> What I am suggesting is basically to fix that.
>>>>>> >>
>>>>>> >> For example, we might say that mailing list A is only for voting,
>>>>>> mailing
>>>>>> >> list B is only for PR and have something like stack overflow for
>>>>>> developer
>>>>>> >> questions (I would even go as far as to have beginner,
>>>>>> intermediate and
>>>>>> >> advanced mailing list for users and beginner/advanced for dev).
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> This can easily be done using stack overflow tags, however, that
>>>>>> would
>>>>>> >> probably be harder to manage.
>>>>>> >>
>>>>>> >> Maybe using special jira tags and manage it in jira?
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> Anyway as I said, the main issue is not user questions (except
>>>>>> maybe
>>>>>> >> advanced ones) but more for dev questions. It is so easy to get
>>>>>> lost in the
>>>>>> >> chatter that it makes it very hard for people to learn spark
>>>>>> internals…
>>>>>> >>
>>>>>> >> Assaf.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> From: Sean Owen [mailto:sowen@cloudera.com]
>>>>>> >> Sent: Wednesday, November 02, 2016 2:07 PM
>>>>>> >> To: Mendelson, Assaf; dev@spark.apache.org
>>>>>> >> Subject: Re: Handling questions in the mailing lists
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> I think that unfortunately mailing lists don't scale well. This
>>>>>> one has
>>>>>> >> thousands of subscribers with different interests and levels of
>>>>>> experience.
>>>>>> >> For any given person, most messages will be irrelevant. I also
>>>>>> find that a
>>>>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE
>>>>>> >> (http://sscce.org/), not something most people are going to
>>>>>> bother replying
>>>>>> >> to even if they could answer. I almost entirely ignore user@
>>>>>> because there
>>>>>> >> are higher-priority channels like PRs to deal with, that already
>>>>>> have
>>>>>> >> hundreds of messages per day. This is why little of it gets an
>>>>>> answer -- too
>>>>>> >> noisy.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> We have to have official mailing lists, in any event, to have some
>>>>>> >> official channel for things like votes and announcements. It's not
>>>>>> wrong to
>>>>>> >> ask questions on user@ of course, but a lot of the questions I
>>>>>> see could
>>>>>> >> have been answered with research of existing docs or looking at
>>>>>> the code. I
>>>>>> >> think that given the scale of the list, it's not wrong to assert
>>>>>> that this
>>>>>> >> is sort of a prerequisite for asking thousands of people to answer
>>>>>> one's
>>>>>> >> question. But we can't enforce that.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> The situation will get better to the extent people ask better
>>>>>> questions,
>>>>>> >> help other people ask better questions, and answer good questions.
>>>>>> I'd
>>>>>> >> encourage anyone feeling this way to try to help along those
>>>>>> dimensions.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
>>>>>> assaf.mendelson@rsa.com>
>>>>>> >> wrote:
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> I know this is a little off topic but I wanted to raise an issue
>>>>>> about
>>>>>> >> handling questions in the mailing list (this is true both for the
>>>>>> user
>>>>>> >> mailing list and the dev but since there are other options such as
>>>>>> stack
>>>>>> >> overflow for user questions, this is more problematic in dev).
>>>>>> >>
>>>>>> >> Let’s say I ask a question (as I recently did). Unfortunately this
>>>>>> was
>>>>>> >> during spark summit in Europe so probably people were busy. In any
>>>>>> case no
>>>>>> >> one answered.
>>>>>> >>
>>>>>> >> The problem is, that if no one answers very soon, the question
>>>>>> will almost
>>>>>> >> certainly remain unanswered because new messages will simply drown
>>>>>> it.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> This is a common issue not just for questions but for any comment
>>>>>> or idea
>>>>>> >> which is not immediately picked up.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> I believe we should have a method of handling this.
>>>>>> >>
>>>>>> >> Generally, I would say these types of things belong in stack
>>>>>> overflow,
>>>>>> >> after all, the way it is built is perfect for this. More seasoned
>>>>>> spark
>>>>>> >> contributors and committers can periodically check out unanswered
>>>>>> questions
>>>>>> >> and answer them.
>>>>>> >>
>>>>>> >> The problem is that stack overflow (as well as other targets such
>>>>>> as the
>>>>>> >> databricks forums) tend to have a more user based orientation.
>>>>>> This means
>>>>>> >> that any spark internal question will almost certainly remain
>>>>>> unanswered.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> I was wondering if we could come up with a solution for this.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> Assaf.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> ________________________________
>>>>>> >>
>>>>>> >> View this message in context: Handling questions in the mailing
>>>>>> lists
>>>>>> >> Sent from the Apache Spark Developers List mailing list archive at
>>>>>> >> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>
>>
>

Re: Handling questions in the mailing lists

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
Damn, I always thought that mailing list is only for nice and welcoming
people and there is nothing to do for me here >:)

To be serious though, there are many questions on the users list which
would fit just fine on SO but it is not true in general. There are
dozens of questions which are to broad, opinion based, ask for external
resources and so on. If you want to direct users to SO you have to help
them to decide if it is the right channel. Otherwise it will just create
a really bad experience for both seeking help and active answerers.
Former ones will be downvoted and bashed, latter ones will have to deal
with handling all the junk and the number of active Spark users with
moderation privileges is really low (with only Massg and me being able
to directly close duplicates).

Believe me, I've seen this before.

On 11/07/2016 05:08 AM, Reynold Xin wrote:
> You have substantially underestimated how opinionated people can be on
> mailing lists too :)
>
> On Sunday, November 6, 2016, Maciej Szymkiewicz
> <mszymkiewicz@gmail.com <ma...@gmail.com>> wrote:
>
>     You have to remember that Stack Overflow crowd (like me) is highly
>     opinionated, so many questions, which could be just fine on the
>     mailing list, will be quickly downvoted and / or closed as
>     off-topic. Just saying...
>
>     -- 
>     Best, 
>     Maciej
>
>
>     On 11/07/2016 04:03 AM, Reynold Xin wrote:
>>     OK I've checked on the ASF member list (which is private so there
>>     is no public archive).
>>
>>     It is not against any ASF rule to recommend StackOverflow as a
>>     place for users to ask questions. I don't think we can or should
>>     delete the existing user@spark list either, but we can certainly
>>     make SO more visible than it is.
>>
>>
>>
>>     On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rxin@databricks.com
>>     <javascript:_e(%7B%7D,'cvml','rxin@databricks.com');>> wrote:
>>
>>         Actually after talking with more ASF members, I believe the
>>         only policy is that development decisions have to be made and
>>         announced on ASF properties (dev list or jira), but user
>>         questions don't have to. 
>>
>>         I'm going to double check this. If it is true, I would
>>         actually recommend us moving entirely over the Q&A part of
>>         the user list to stackoverflow, or at least make that the
>>         recommended way rather than the existing user list which is
>>         not very scalable. 
>>
>>
>>         On Wednesday, November 2, 2016, Nicholas Chammas
>>         <nicholas.chammas@gmail.com
>>         <javascript:_e(%7B%7D,'cvml','nicholas.chammas@gmail.com');>>
>>         wrote:
>>
>>             We\u2019ve discussed several times upgrading our communication
>>             tools, as far back as 2014 and maybe even before that
>>             too. The bottom line is that we can\u2019t due to ASF rules
>>             requiring the use of ASF-managed mailing lists.
>>
>>             For some history, see this discussion:
>>
>>               * https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
>>                 <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>>               * https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E
>>                 <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>>
>>             (It\u2019s ironic that it\u2019s difficult to follow the past
>>             discussion on why we can\u2019t change our official
>>             communication tools due to those very tools\u2026)
>>
>>             Nick
>>
>>             \u200b
>>
>>             On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida
>>             <ri...@actnowib.com> wrote:
>>
>>                 I fell Assaf point is quite relevant if we want to
>>                 move this project forward from the Spark user
>>                 perspective (as I do). In fact, we're still using
>>                 20th century tools (mailing lists) with some add-ons
>>                 (like Stack Overflow).
>>
>>                 As usually, Sean and Cody's contributions are very to
>>                 the point.
>>                 I fell it is indeed a matter of of culture (hard to
>>                 enforce) and tools (much easier). Isn't it?
>>
>>                 On 2 November 2016 at 16:36, Cody Koeninger
>>                 <co...@koeninger.org> wrote:
>>
>>                     So concrete things people could do
>>
>>                     - users could tag subject lines appropriately to
>>                     the component they're
>>                     asking about
>>
>>                     - contributors could monitor user@ for tags
>>                     relating to components
>>                     they've worked on.
>>                     I'd be surprised if my miss rate for any mailing
>>                     list questions
>>                     well-labeled as Kafka was higher than 5%
>>
>>                     - committers could be more aggressive about
>>                     soliciting and merging PRs
>>                     to improve documentation.
>>                     It's a lot easier to answer even poorly-asked
>>                     questions with a link to
>>                     relevant docs.
>>
>>                     On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen
>>                     <so...@cloudera.com> wrote:
>>                     > There's already reviews@ and issues@. dev@ is
>>                     for project development itself
>>                     > and I think is OK. You're suggesting splitting
>>                     up user@ and I sympathize
>>                     > with the motivation. Experience tells me that
>>                     we'll have a beginner@ that's
>>                     > then totally ignored, and people will quickly
>>                     learn to post to advanced@ to
>>                     > get attention, and we'll be back where we
>>                     started. Putting it in JIRA
>>                     > doesn't help. I don't think this a problem that
>>                     is merely down to lack of
>>                     > process. It actually requires cultivating a
>>                     culture change on the community
>>                     > list.
>>                     >
>>                     > On Wed, Nov 2, 2016 at 12:11 PM Mendelson,
>>                     Assaf <As...@rsa.com>
>>                     > wrote:
>>                     >>
>>                     >> What I am suggesting is basically to fix that.
>>                     >>
>>                     >> For example, we might say that mailing list A
>>                     is only for voting, mailing
>>                     >> list B is only for PR and have something like
>>                     stack overflow for developer
>>                     >> questions (I would even go as far as to have
>>                     beginner, intermediate and
>>                     >> advanced mailing list for users and
>>                     beginner/advanced for dev).
>>                     >>
>>                     >>
>>                     >>
>>                     >> This can easily be done using stack overflow
>>                     tags, however, that would
>>                     >> probably be harder to manage.
>>                     >>
>>                     >> Maybe using special jira tags and manage it in
>>                     jira?
>>                     >>
>>                     >>
>>                     >>
>>                     >> Anyway as I said, the main issue is not user
>>                     questions (except maybe
>>                     >> advanced ones) but more for dev questions. It
>>                     is so easy to get lost in the
>>                     >> chatter that it makes it very hard for people
>>                     to learn spark internals\u2026
>>                     >>
>>                     >> Assaf.
>>                     >>
>>                     >>
>>                     >>
>>                     >> From: Sean Owen [mailto:sowen@cloudera.com]
>>                     >> Sent: Wednesday, November 02, 2016 2:07 PM
>>                     >> To: Mendelson, Assaf; dev@spark.apache.org
>>                     >> Subject: Re: Handling questions in the mailing
>>                     lists
>>                     >>
>>                     >>
>>                     >>
>>                     >> I think that unfortunately mailing lists don't
>>                     scale well. This one has
>>                     >> thousands of subscribers with different
>>                     interests and levels of experience.
>>                     >> For any given person, most messages will be
>>                     irrelevant. I also find that a
>>                     >> lot of questions on user@ are not well-asked,
>>                     aren't an SSCCE
>>                     >> (http://sscce.org/), not something most people
>>                     are going to bother replying
>>                     >> to even if they could answer. I almost
>>                     entirely ignore user@ because there
>>                     >> are higher-priority channels like PRs to deal
>>                     with, that already have
>>                     >> hundreds of messages per day. This is why
>>                     little of it gets an answer -- too
>>                     >> noisy.
>>                     >>
>>                     >>
>>                     >>
>>                     >> We have to have official mailing lists, in any
>>                     event, to have some
>>                     >> official channel for things like votes and
>>                     announcements. It's not wrong to
>>                     >> ask questions on user@ of course, but a lot of
>>                     the questions I see could
>>                     >> have been answered with research of existing
>>                     docs or looking at the code. I
>>                     >> think that given the scale of the list, it's
>>                     not wrong to assert that this
>>                     >> is sort of a prerequisite for asking thousands
>>                     of people to answer one's
>>                     >> question. But we can't enforce that.
>>                     >>
>>                     >>
>>                     >>
>>                     >> The situation will get better to the extent
>>                     people ask better questions,
>>                     >> help other people ask better questions, and
>>                     answer good questions. I'd
>>                     >> encourage anyone feeling this way to try to
>>                     help along those dimensions.
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >> On Wed, Nov 2, 2016 at 11:32 AM
>>                     assaf.mendelson <as...@rsa.com>
>>                     >> wrote:
>>                     >>
>>                     >> Hi,
>>                     >>
>>                     >> I know this is a little off topic but I wanted
>>                     to raise an issue about
>>                     >> handling questions in the mailing list (this
>>                     is true both for the user
>>                     >> mailing list and the dev but since there are
>>                     other options such as stack
>>                     >> overflow for user questions, this is more
>>                     problematic in dev).
>>                     >>
>>                     >> Let\u2019s say I ask a question (as I recently
>>                     did). Unfortunately this was
>>                     >> during spark summit in Europe so probably
>>                     people were busy. In any case no
>>                     >> one answered.
>>                     >>
>>                     >> The problem is, that if no one answers very
>>                     soon, the question will almost
>>                     >> certainly remain unanswered because new
>>                     messages will simply drown it.
>>                     >>
>>                     >>
>>                     >>
>>                     >> This is a common issue not just for questions
>>                     but for any comment or idea
>>                     >> which is not immediately picked up.
>>                     >>
>>                     >>
>>                     >>
>>                     >> I believe we should have a method of handling
>>                     this.
>>                     >>
>>                     >> Generally, I would say these types of things
>>                     belong in stack overflow,
>>                     >> after all, the way it is built is perfect for
>>                     this. More seasoned spark
>>                     >> contributors and committers can periodically
>>                     check out unanswered questions
>>                     >> and answer them.
>>                     >>
>>                     >> The problem is that stack overflow (as well as
>>                     other targets such as the
>>                     >> databricks forums) tend to have a more user
>>                     based orientation. This means
>>                     >> that any spark internal question will almost
>>                     certainly remain unanswered.
>>                     >>
>>                     >>
>>                     >>
>>                     >> I was wondering if we could come up with a
>>                     solution for this.
>>                     >>
>>                     >>
>>                     >>
>>                     >> Assaf.
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >>
>>                     >> ________________________________
>>                     >>
>>                     >> View this message in context: Handling
>>                     questions in the mailing lists
>>                     >> Sent from the Apache Spark Developers List
>>                     mailing list archive at
>>                     >> Nabble.com.
>>
>>                     ---------------------------------------------------------------------
>>                     To unsubscribe e-mail:
>>                     dev-unsubscribe@spark.apache.org
>>
>>
>>
>


Re: Handling questions in the mailing lists

Posted by Reynold Xin <rx...@databricks.com>.
You have substantially underestimated how opinionated people can be on
mailing lists too :)

On Sunday, November 6, 2016, Maciej Szymkiewicz <ms...@gmail.com>
wrote:

> You have to remember that Stack Overflow crowd (like me) is highly
> opinionated, so many questions, which could be just fine on the mailing
> list, will be quickly downvoted and / or closed as off-topic. Just
> saying...
>
> --
> Best,
> Maciej
>
>
> On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
> OK I've checked on the ASF member list (which is private so there is no
> public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place for
> users to ask questions. I don't think we can or should delete the existing
> user@spark list either, but we can certainly make SO more visible than it
> is.
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rxin@databricks.com
> <javascript:_e(%7B%7D,'cvml','rxin@databricks.com');>> wrote:
>
>> Actually after talking with more ASF members, I believe the only policy
>> is that development decisions have to be made and announced on ASF
>> properties (dev list or jira), but user questions don't have to.
>>
>> I'm going to double check this. If it is true, I would actually recommend
>> us moving entirely over the Q&A part of the user list to stackoverflow, or
>> at least make that the recommended way rather than the existing user list
>> which is not very scalable.
>>
>>
>> On Wednesday, November 2, 2016, Nicholas Chammas <
>> nicholas.chammas@gmail.com
>> <javascript:_e(%7B%7D,'cvml','nicholas.chammas@gmail.com');>> wrote:
>>
>>> We’ve discussed several times upgrading our communication tools, as far
>>> back as 2014 and maybe even before that too. The bottom line is that we
>>> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>>>
>>> For some history, see this discussion:
>>>
>>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201412.
>>>    mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
>>>    mail.gmail.com%3E
>>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201501.
>>>    mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
>>>    mail.gmail.com%3E
>>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>>>
>>> (It’s ironic that it’s difficult to follow the past discussion on why we
>>> can’t change our official communication tools due to those very tools…)
>>>
>>> Nick
>>> ​
>>>
>>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
>>> ricardo.almeida@actnowib.com> wrote:
>>>
>>>> I fell Assaf point is quite relevant if we want to move this project
>>>> forward from the Spark user perspective (as I do). In fact, we're
>>>> still using 20th century tools (mailing lists) with some add-ons (like
>>>> Stack Overflow).
>>>>
>>>> As usually, Sean and Cody's contributions are very to the point.
>>>> I fell it is indeed a matter of of culture (hard to enforce) and tools
>>>> (much easier). Isn't it?
>>>>
>>>> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:
>>>>
>>>>> So concrete things people could do
>>>>>
>>>>> - users could tag subject lines appropriately to the component they're
>>>>> asking about
>>>>>
>>>>> - contributors could monitor user@ for tags relating to components
>>>>> they've worked on.
>>>>> I'd be surprised if my miss rate for any mailing list questions
>>>>> well-labeled as Kafka was higher than 5%
>>>>>
>>>>> - committers could be more aggressive about soliciting and merging PRs
>>>>> to improve documentation.
>>>>> It's a lot easier to answer even poorly-asked questions with a link to
>>>>> relevant docs.
>>>>>
>>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
>>>>> > There's already reviews@ and issues@. dev@ is for project
>>>>> development itself
>>>>> > and I think is OK. You're suggesting splitting up user@ and I
>>>>> sympathize
>>>>> > with the motivation. Experience tells me that we'll have a beginner@
>>>>> that's
>>>>> > then totally ignored, and people will quickly learn to post to
>>>>> advanced@ to
>>>>> > get attention, and we'll be back where we started. Putting it in JIRA
>>>>> > doesn't help. I don't think this a problem that is merely down to
>>>>> lack of
>>>>> > process. It actually requires cultivating a culture change on the
>>>>> community
>>>>> > list.
>>>>> >
>>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
>>>>> Assaf.Mendelson@rsa.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> What I am suggesting is basically to fix that.
>>>>> >>
>>>>> >> For example, we might say that mailing list A is only for voting,
>>>>> mailing
>>>>> >> list B is only for PR and have something like stack overflow for
>>>>> developer
>>>>> >> questions (I would even go as far as to have beginner, intermediate
>>>>> and
>>>>> >> advanced mailing list for users and beginner/advanced for dev).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> This can easily be done using stack overflow tags, however, that
>>>>> would
>>>>> >> probably be harder to manage.
>>>>> >>
>>>>> >> Maybe using special jira tags and manage it in jira?
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Anyway as I said, the main issue is not user questions (except maybe
>>>>> >> advanced ones) but more for dev questions. It is so easy to get
>>>>> lost in the
>>>>> >> chatter that it makes it very hard for people to learn spark
>>>>> internals…
>>>>> >>
>>>>> >> Assaf.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> From: Sean Owen [mailto:sowen@cloudera.com]
>>>>> >> Sent: Wednesday, November 02, 2016 2:07 PM
>>>>> >> To: Mendelson, Assaf; dev@spark.apache.org
>>>>> >> Subject: Re: Handling questions in the mailing lists
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> I think that unfortunately mailing lists don't scale well. This one
>>>>> has
>>>>> >> thousands of subscribers with different interests and levels of
>>>>> experience.
>>>>> >> For any given person, most messages will be irrelevant. I also find
>>>>> that a
>>>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE
>>>>> >> (http://sscce.org/), not something most people are going to bother
>>>>> replying
>>>>> >> to even if they could answer. I almost entirely ignore user@
>>>>> because there
>>>>> >> are higher-priority channels like PRs to deal with, that already
>>>>> have
>>>>> >> hundreds of messages per day. This is why little of it gets an
>>>>> answer -- too
>>>>> >> noisy.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> We have to have official mailing lists, in any event, to have some
>>>>> >> official channel for things like votes and announcements. It's not
>>>>> wrong to
>>>>> >> ask questions on user@ of course, but a lot of the questions I see
>>>>> could
>>>>> >> have been answered with research of existing docs or looking at the
>>>>> code. I
>>>>> >> think that given the scale of the list, it's not wrong to assert
>>>>> that this
>>>>> >> is sort of a prerequisite for asking thousands of people to answer
>>>>> one's
>>>>> >> question. But we can't enforce that.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> The situation will get better to the extent people ask better
>>>>> questions,
>>>>> >> help other people ask better questions, and answer good questions.
>>>>> I'd
>>>>> >> encourage anyone feeling this way to try to help along those
>>>>> dimensions.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
>>>>> assaf.mendelson@rsa.com>
>>>>> >> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> I know this is a little off topic but I wanted to raise an issue
>>>>> about
>>>>> >> handling questions in the mailing list (this is true both for the
>>>>> user
>>>>> >> mailing list and the dev but since there are other options such as
>>>>> stack
>>>>> >> overflow for user questions, this is more problematic in dev).
>>>>> >>
>>>>> >> Let’s say I ask a question (as I recently did). Unfortunately this
>>>>> was
>>>>> >> during spark summit in Europe so probably people were busy. In any
>>>>> case no
>>>>> >> one answered.
>>>>> >>
>>>>> >> The problem is, that if no one answers very soon, the question will
>>>>> almost
>>>>> >> certainly remain unanswered because new messages will simply drown
>>>>> it.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> This is a common issue not just for questions but for any comment
>>>>> or idea
>>>>> >> which is not immediately picked up.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> I believe we should have a method of handling this.
>>>>> >>
>>>>> >> Generally, I would say these types of things belong in stack
>>>>> overflow,
>>>>> >> after all, the way it is built is perfect for this. More seasoned
>>>>> spark
>>>>> >> contributors and committers can periodically check out unanswered
>>>>> questions
>>>>> >> and answer them.
>>>>> >>
>>>>> >> The problem is that stack overflow (as well as other targets such
>>>>> as the
>>>>> >> databricks forums) tend to have a more user based orientation. This
>>>>> means
>>>>> >> that any spark internal question will almost certainly remain
>>>>> unanswered.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> I was wondering if we could come up with a solution for this.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Assaf.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> ________________________________
>>>>> >>
>>>>> >> View this message in context: Handling questions in the mailing
>>>>> lists
>>>>> >> Sent from the Apache Spark Developers List mailing list archive at
>>>>> >> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>
>>>>
>
>

Re: Handling questions in the mailing lists

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
You have to remember that Stack Overflow crowd (like me) is highly
opinionated, so many questions, which could be just fine on the mailing
list, will be quickly downvoted and / or closed as off-topic. Just
saying...

-- 
Best, 
Maciej


On 11/07/2016 04:03 AM, Reynold Xin wrote:
> OK I've checked on the ASF member list (which is private so there is
> no public archive).
>
> It is not against any ASF rule to recommend StackOverflow as a place
> for users to ask questions. I don't think we can or should delete the
> existing user@spark list either, but we can certainly make SO more
> visible than it is.
>
>
>
> On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rxin@databricks.com
> <ma...@databricks.com>> wrote:
>
>     Actually after talking with more ASF members, I believe the only
>     policy is that development decisions have to be made and announced
>     on ASF properties (dev list or jira), but user questions don't
>     have to. 
>
>     I'm going to double check this. If it is true, I would actually
>     recommend us moving entirely over the Q&A part of the user list to
>     stackoverflow, or at least make that the recommended way rather
>     than the existing user list which is not very scalable. 
>
>
>     On Wednesday, November 2, 2016, Nicholas Chammas
>     <nicholas.chammas@gmail.com <ma...@gmail.com>>
>     wrote:
>
>         We\u2019ve discussed several times upgrading our communication
>         tools, as far back as 2014 and maybe even before that too. The
>         bottom line is that we can\u2019t due to ASF rules requiring the
>         use of ASF-managed mailing lists.
>
>         For some history, see this discussion:
>
>           * https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
>             <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>           * https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E
>             <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
>         (It\u2019s ironic that it\u2019s difficult to follow the past discussion
>         on why we can\u2019t change our official communication tools due to
>         those very tools\u2026)
>
>         Nick
>
>         \u200b
>
>         On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida
>         <ri...@actnowib.com> wrote:
>
>             I fell Assaf point is quite relevant if we want to move
>             this project forward from the Spark user perspective (as I
>             do). In fact, we're still using 20th century tools
>             (mailing lists) with some add-ons (like Stack Overflow).
>
>             As usually, Sean and Cody's contributions are very to the
>             point.
>             I fell it is indeed a matter of of culture (hard to
>             enforce) and tools (much easier). Isn't it?
>
>             On 2 November 2016 at 16:36, Cody Koeninger
>             <co...@koeninger.org> wrote:
>
>                 So concrete things people could do
>
>                 - users could tag subject lines appropriately to the
>                 component they're
>                 asking about
>
>                 - contributors could monitor user@ for tags relating
>                 to components
>                 they've worked on.
>                 I'd be surprised if my miss rate for any mailing list
>                 questions
>                 well-labeled as Kafka was higher than 5%
>
>                 - committers could be more aggressive about soliciting
>                 and merging PRs
>                 to improve documentation.
>                 It's a lot easier to answer even poorly-asked
>                 questions with a link to
>                 relevant docs.
>
>                 On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen
>                 <so...@cloudera.com> wrote:
>                 > There's already reviews@ and issues@. dev@ is for
>                 project development itself
>                 > and I think is OK. You're suggesting splitting up
>                 user@ and I sympathize
>                 > with the motivation. Experience tells me that we'll
>                 have a beginner@ that's
>                 > then totally ignored, and people will quickly learn
>                 to post to advanced@ to
>                 > get attention, and we'll be back where we started.
>                 Putting it in JIRA
>                 > doesn't help. I don't think this a problem that is
>                 merely down to lack of
>                 > process. It actually requires cultivating a culture
>                 change on the community
>                 > list.
>                 >
>                 > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf
>                 <As...@rsa.com>
>                 > wrote:
>                 >>
>                 >> What I am suggesting is basically to fix that.
>                 >>
>                 >> For example, we might say that mailing list A is
>                 only for voting, mailing
>                 >> list B is only for PR and have something like stack
>                 overflow for developer
>                 >> questions (I would even go as far as to have
>                 beginner, intermediate and
>                 >> advanced mailing list for users and
>                 beginner/advanced for dev).
>                 >>
>                 >>
>                 >>
>                 >> This can easily be done using stack overflow tags,
>                 however, that would
>                 >> probably be harder to manage.
>                 >>
>                 >> Maybe using special jira tags and manage it in jira?
>                 >>
>                 >>
>                 >>
>                 >> Anyway as I said, the main issue is not user
>                 questions (except maybe
>                 >> advanced ones) but more for dev questions. It is so
>                 easy to get lost in the
>                 >> chatter that it makes it very hard for people to
>                 learn spark internals\u2026
>                 >>
>                 >> Assaf.
>                 >>
>                 >>
>                 >>
>                 >> From: Sean Owen [mailto:sowen@cloudera.com]
>                 >> Sent: Wednesday, November 02, 2016 2:07 PM
>                 >> To: Mendelson, Assaf; dev@spark.apache.org
>                 >> Subject: Re: Handling questions in the mailing lists
>                 >>
>                 >>
>                 >>
>                 >> I think that unfortunately mailing lists don't
>                 scale well. This one has
>                 >> thousands of subscribers with different interests
>                 and levels of experience.
>                 >> For any given person, most messages will be
>                 irrelevant. I also find that a
>                 >> lot of questions on user@ are not well-asked,
>                 aren't an SSCCE
>                 >> (http://sscce.org/), not something most people are
>                 going to bother replying
>                 >> to even if they could answer. I almost entirely
>                 ignore user@ because there
>                 >> are higher-priority channels like PRs to deal with,
>                 that already have
>                 >> hundreds of messages per day. This is why little of
>                 it gets an answer -- too
>                 >> noisy.
>                 >>
>                 >>
>                 >>
>                 >> We have to have official mailing lists, in any
>                 event, to have some
>                 >> official channel for things like votes and
>                 announcements. It's not wrong to
>                 >> ask questions on user@ of course, but a lot of the
>                 questions I see could
>                 >> have been answered with research of existing docs
>                 or looking at the code. I
>                 >> think that given the scale of the list, it's not
>                 wrong to assert that this
>                 >> is sort of a prerequisite for asking thousands of
>                 people to answer one's
>                 >> question. But we can't enforce that.
>                 >>
>                 >>
>                 >>
>                 >> The situation will get better to the extent people
>                 ask better questions,
>                 >> help other people ask better questions, and answer
>                 good questions. I'd
>                 >> encourage anyone feeling this way to try to help
>                 along those dimensions.
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson
>                 <as...@rsa.com>
>                 >> wrote:
>                 >>
>                 >> Hi,
>                 >>
>                 >> I know this is a little off topic but I wanted to
>                 raise an issue about
>                 >> handling questions in the mailing list (this is
>                 true both for the user
>                 >> mailing list and the dev but since there are other
>                 options such as stack
>                 >> overflow for user questions, this is more
>                 problematic in dev).
>                 >>
>                 >> Let\u2019s say I ask a question (as I recently did).
>                 Unfortunately this was
>                 >> during spark summit in Europe so probably people
>                 were busy. In any case no
>                 >> one answered.
>                 >>
>                 >> The problem is, that if no one answers very soon,
>                 the question will almost
>                 >> certainly remain unanswered because new messages
>                 will simply drown it.
>                 >>
>                 >>
>                 >>
>                 >> This is a common issue not just for questions but
>                 for any comment or idea
>                 >> which is not immediately picked up.
>                 >>
>                 >>
>                 >>
>                 >> I believe we should have a method of handling this.
>                 >>
>                 >> Generally, I would say these types of things belong
>                 in stack overflow,
>                 >> after all, the way it is built is perfect for this.
>                 More seasoned spark
>                 >> contributors and committers can periodically check
>                 out unanswered questions
>                 >> and answer them.
>                 >>
>                 >> The problem is that stack overflow (as well as
>                 other targets such as the
>                 >> databricks forums) tend to have a more user based
>                 orientation. This means
>                 >> that any spark internal question will almost
>                 certainly remain unanswered.
>                 >>
>                 >>
>                 >>
>                 >> I was wondering if we could come up with a solution
>                 for this.
>                 >>
>                 >>
>                 >>
>                 >> Assaf.
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >> ________________________________
>                 >>
>                 >> View this message in context: Handling questions in
>                 the mailing lists
>                 >> Sent from the Apache Spark Developers List mailing
>                 list archive at
>                 >> Nabble.com.
>
>                 ---------------------------------------------------------------------
>                 To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>
>


Re: Handling questions in the mailing lists

Posted by Reynold Xin <rx...@databricks.com>.
OK I've checked on the ASF member list (which is private so there is no
public archive).

It is not against any ASF rule to recommend StackOverflow as a place for
users to ask questions. I don't think we can or should delete the existing
user@spark list either, but we can certainly make SO more visible than it
is.



On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <rx...@databricks.com> wrote:

> Actually after talking with more ASF members, I believe the only policy is
> that development decisions have to be made and announced on ASF properties
> (dev list or jira), but user questions don't have to.
>
> I'm going to double check this. If it is true, I would actually recommend
> us moving entirely over the Q&A part of the user list to stackoverflow, or
> at least make that the recommended way rather than the existing user list
> which is not very scalable.
>
>
> On Wednesday, November 2, 2016, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> We’ve discussed several times upgrading our communication tools, as far
>> back as 2014 and maybe even before that too. The bottom line is that we
>> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>>
>> For some history, see this discussion:
>>
>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201412.
>>    mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@
>>    mail.gmail.com%3E
>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>>    - https://mail-archives.apache.org/mod_mbox/spark-user/201501.
>>    mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@
>>    mail.gmail.com%3E
>>    <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>>
>> (It’s ironic that it’s difficult to follow the past discussion on why we
>> can’t change our official communication tools due to those very tools…)
>>
>> Nick
>> ​
>>
>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
>> ricardo.almeida@actnowib.com> wrote:
>>
>>> I fell Assaf point is quite relevant if we want to move this project
>>> forward from the Spark user perspective (as I do). In fact, we're still
>>> using 20th century tools (mailing lists) with some add-ons (like Stack
>>> Overflow).
>>>
>>> As usually, Sean and Cody's contributions are very to the point.
>>> I fell it is indeed a matter of of culture (hard to enforce) and tools
>>> (much easier). Isn't it?
>>>
>>> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:
>>>
>>>> So concrete things people could do
>>>>
>>>> - users could tag subject lines appropriately to the component they're
>>>> asking about
>>>>
>>>> - contributors could monitor user@ for tags relating to components
>>>> they've worked on.
>>>> I'd be surprised if my miss rate for any mailing list questions
>>>> well-labeled as Kafka was higher than 5%
>>>>
>>>> - committers could be more aggressive about soliciting and merging PRs
>>>> to improve documentation.
>>>> It's a lot easier to answer even poorly-asked questions with a link to
>>>> relevant docs.
>>>>
>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
>>>> > There's already reviews@ and issues@. dev@ is for project
>>>> development itself
>>>> > and I think is OK. You're suggesting splitting up user@ and I
>>>> sympathize
>>>> > with the motivation. Experience tells me that we'll have a beginner@
>>>> that's
>>>> > then totally ignored, and people will quickly learn to post to
>>>> advanced@ to
>>>> > get attention, and we'll be back where we started. Putting it in JIRA
>>>> > doesn't help. I don't think this a problem that is merely down to
>>>> lack of
>>>> > process. It actually requires cultivating a culture change on the
>>>> community
>>>> > list.
>>>> >
>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
>>>> Assaf.Mendelson@rsa.com>
>>>> > wrote:
>>>> >>
>>>> >> What I am suggesting is basically to fix that.
>>>> >>
>>>> >> For example, we might say that mailing list A is only for voting,
>>>> mailing
>>>> >> list B is only for PR and have something like stack overflow for
>>>> developer
>>>> >> questions (I would even go as far as to have beginner, intermediate
>>>> and
>>>> >> advanced mailing list for users and beginner/advanced for dev).
>>>> >>
>>>> >>
>>>> >>
>>>> >> This can easily be done using stack overflow tags, however, that
>>>> would
>>>> >> probably be harder to manage.
>>>> >>
>>>> >> Maybe using special jira tags and manage it in jira?
>>>> >>
>>>> >>
>>>> >>
>>>> >> Anyway as I said, the main issue is not user questions (except maybe
>>>> >> advanced ones) but more for dev questions. It is so easy to get lost
>>>> in the
>>>> >> chatter that it makes it very hard for people to learn spark
>>>> internals…
>>>> >>
>>>> >> Assaf.
>>>> >>
>>>> >>
>>>> >>
>>>> >> From: Sean Owen [mailto:sowen@cloudera.com]
>>>> >> Sent: Wednesday, November 02, 2016 2:07 PM
>>>> >> To: Mendelson, Assaf; dev@spark.apache.org
>>>> >> Subject: Re: Handling questions in the mailing lists
>>>> >>
>>>> >>
>>>> >>
>>>> >> I think that unfortunately mailing lists don't scale well. This one
>>>> has
>>>> >> thousands of subscribers with different interests and levels of
>>>> experience.
>>>> >> For any given person, most messages will be irrelevant. I also find
>>>> that a
>>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE
>>>> >> (http://sscce.org/), not something most people are going to bother
>>>> replying
>>>> >> to even if they could answer. I almost entirely ignore user@
>>>> because there
>>>> >> are higher-priority channels like PRs to deal with, that already have
>>>> >> hundreds of messages per day. This is why little of it gets an
>>>> answer -- too
>>>> >> noisy.
>>>> >>
>>>> >>
>>>> >>
>>>> >> We have to have official mailing lists, in any event, to have some
>>>> >> official channel for things like votes and announcements. It's not
>>>> wrong to
>>>> >> ask questions on user@ of course, but a lot of the questions I see
>>>> could
>>>> >> have been answered with research of existing docs or looking at the
>>>> code. I
>>>> >> think that given the scale of the list, it's not wrong to assert
>>>> that this
>>>> >> is sort of a prerequisite for asking thousands of people to answer
>>>> one's
>>>> >> question. But we can't enforce that.
>>>> >>
>>>> >>
>>>> >>
>>>> >> The situation will get better to the extent people ask better
>>>> questions,
>>>> >> help other people ask better questions, and answer good questions.
>>>> I'd
>>>> >> encourage anyone feeling this way to try to help along those
>>>> dimensions.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
>>>> assaf.mendelson@rsa.com>
>>>> >> wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I know this is a little off topic but I wanted to raise an issue
>>>> about
>>>> >> handling questions in the mailing list (this is true both for the
>>>> user
>>>> >> mailing list and the dev but since there are other options such as
>>>> stack
>>>> >> overflow for user questions, this is more problematic in dev).
>>>> >>
>>>> >> Let’s say I ask a question (as I recently did). Unfortunately this
>>>> was
>>>> >> during spark summit in Europe so probably people were busy. In any
>>>> case no
>>>> >> one answered.
>>>> >>
>>>> >> The problem is, that if no one answers very soon, the question will
>>>> almost
>>>> >> certainly remain unanswered because new messages will simply drown
>>>> it.
>>>> >>
>>>> >>
>>>> >>
>>>> >> This is a common issue not just for questions but for any comment or
>>>> idea
>>>> >> which is not immediately picked up.
>>>> >>
>>>> >>
>>>> >>
>>>> >> I believe we should have a method of handling this.
>>>> >>
>>>> >> Generally, I would say these types of things belong in stack
>>>> overflow,
>>>> >> after all, the way it is built is perfect for this. More seasoned
>>>> spark
>>>> >> contributors and committers can periodically check out unanswered
>>>> questions
>>>> >> and answer them.
>>>> >>
>>>> >> The problem is that stack overflow (as well as other targets such as
>>>> the
>>>> >> databricks forums) tend to have a more user based orientation. This
>>>> means
>>>> >> that any spark internal question will almost certainly remain
>>>> unanswered.
>>>> >>
>>>> >>
>>>> >>
>>>> >> I was wondering if we could come up with a solution for this.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Assaf.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> ________________________________
>>>> >>
>>>> >> View this message in context: Handling questions in the mailing lists
>>>> >> Sent from the Apache Spark Developers List mailing list archive at
>>>> >> Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>
>>>

Re: Handling questions in the mailing lists

Posted by Reynold Xin <rx...@databricks.com>.
Actually after talking with more ASF members, I believe the only policy is
that development decisions have to be made and announced on ASF properties
(dev list or jira), but user questions don't have to.

I'm going to double check this. If it is true, I would actually recommend
us moving entirely over the Q&A part of the user list to stackoverflow, or
at least make that the recommended way rather than the existing user list
which is not very scalable.

On Wednesday, November 2, 2016, Nicholas Chammas <ni...@gmail.com>
wrote:

> We’ve discussed several times upgrading our communication tools, as far
> back as 2014 and maybe even before that too. The bottom line is that we
> can’t due to ASF rules requiring the use of ASF-managed mailing lists.
>
> For some history, see this discussion:
>
>    - https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%
>    3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
>    <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>    - https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%
>    3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E
>    <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
> (It’s ironic that it’s difficult to follow the past discussion on why we
> can’t change our official communication tools due to those very tools…)
>
> Nick
> ​
>
> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
> ricardo.almeida@actnowib.com
> <javascript:_e(%7B%7D,'cvml','ricardo.almeida@actnowib.com');>> wrote:
>
>> I fell Assaf point is quite relevant if we want to move this project
>> forward from the Spark user perspective (as I do). In fact, we're still
>> using 20th century tools (mailing lists) with some add-ons (like Stack
>> Overflow).
>>
>> As usually, Sean and Cody's contributions are very to the point.
>> I fell it is indeed a matter of of culture (hard to enforce) and tools
>> (much easier). Isn't it?
>>
>> On 2 November 2016 at 16:36, Cody Koeninger <cody@koeninger.org
>> <javascript:_e(%7B%7D,'cvml','cody@koeninger.org');>> wrote:
>>
>>> So concrete things people could do
>>>
>>> - users could tag subject lines appropriately to the component they're
>>> asking about
>>>
>>> - contributors could monitor user@ for tags relating to components
>>> they've worked on.
>>> I'd be surprised if my miss rate for any mailing list questions
>>> well-labeled as Kafka was higher than 5%
>>>
>>> - committers could be more aggressive about soliciting and merging PRs
>>> to improve documentation.
>>> It's a lot easier to answer even poorly-asked questions with a link to
>>> relevant docs.
>>>
>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <sowen@cloudera.com
>>> <javascript:_e(%7B%7D,'cvml','sowen@cloudera.com');>> wrote:
>>> > There's already reviews@ and issues@. dev@ is for project development
>>> itself
>>> > and I think is OK. You're suggesting splitting up user@ and I
>>> sympathize
>>> > with the motivation. Experience tells me that we'll have a beginner@
>>> that's
>>> > then totally ignored, and people will quickly learn to post to
>>> advanced@ to
>>> > get attention, and we'll be back where we started. Putting it in JIRA
>>> > doesn't help. I don't think this a problem that is merely down to lack
>>> of
>>> > process. It actually requires cultivating a culture change on the
>>> community
>>> > list.
>>> >
>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
>>> Assaf.Mendelson@rsa.com
>>> <javascript:_e(%7B%7D,'cvml','Assaf.Mendelson@rsa.com');>>
>>> > wrote:
>>> >>
>>> >> What I am suggesting is basically to fix that.
>>> >>
>>> >> For example, we might say that mailing list A is only for voting,
>>> mailing
>>> >> list B is only for PR and have something like stack overflow for
>>> developer
>>> >> questions (I would even go as far as to have beginner, intermediate
>>> and
>>> >> advanced mailing list for users and beginner/advanced for dev).
>>> >>
>>> >>
>>> >>
>>> >> This can easily be done using stack overflow tags, however, that would
>>> >> probably be harder to manage.
>>> >>
>>> >> Maybe using special jira tags and manage it in jira?
>>> >>
>>> >>
>>> >>
>>> >> Anyway as I said, the main issue is not user questions (except maybe
>>> >> advanced ones) but more for dev questions. It is so easy to get lost
>>> in the
>>> >> chatter that it makes it very hard for people to learn spark
>>> internals…
>>> >>
>>> >> Assaf.
>>> >>
>>> >>
>>> >>
>>> >> From: Sean Owen [mailto:sowen@cloudera.com
>>> <javascript:_e(%7B%7D,'cvml','sowen@cloudera.com');>]
>>> >> Sent: Wednesday, November 02, 2016 2:07 PM
>>> >> To: Mendelson, Assaf; dev@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','dev@spark.apache.org');>
>>> >> Subject: Re: Handling questions in the mailing lists
>>> >>
>>> >>
>>> >>
>>> >> I think that unfortunately mailing lists don't scale well. This one
>>> has
>>> >> thousands of subscribers with different interests and levels of
>>> experience.
>>> >> For any given person, most messages will be irrelevant. I also find
>>> that a
>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE
>>> >> (http://sscce.org/), not something most people are going to bother
>>> replying
>>> >> to even if they could answer. I almost entirely ignore user@ because
>>> there
>>> >> are higher-priority channels like PRs to deal with, that already have
>>> >> hundreds of messages per day. This is why little of it gets an answer
>>> -- too
>>> >> noisy.
>>> >>
>>> >>
>>> >>
>>> >> We have to have official mailing lists, in any event, to have some
>>> >> official channel for things like votes and announcements. It's not
>>> wrong to
>>> >> ask questions on user@ of course, but a lot of the questions I see
>>> could
>>> >> have been answered with research of existing docs or looking at the
>>> code. I
>>> >> think that given the scale of the list, it's not wrong to assert that
>>> this
>>> >> is sort of a prerequisite for asking thousands of people to answer
>>> one's
>>> >> question. But we can't enforce that.
>>> >>
>>> >>
>>> >>
>>> >> The situation will get better to the extent people ask better
>>> questions,
>>> >> help other people ask better questions, and answer good questions. I'd
>>> >> encourage anyone feeling this way to try to help along those
>>> dimensions.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
>>> assaf.mendelson@rsa.com
>>> <javascript:_e(%7B%7D,'cvml','assaf.mendelson@rsa.com');>>
>>> >> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I know this is a little off topic but I wanted to raise an issue about
>>> >> handling questions in the mailing list (this is true both for the user
>>> >> mailing list and the dev but since there are other options such as
>>> stack
>>> >> overflow for user questions, this is more problematic in dev).
>>> >>
>>> >> Let’s say I ask a question (as I recently did). Unfortunately this was
>>> >> during spark summit in Europe so probably people were busy. In any
>>> case no
>>> >> one answered.
>>> >>
>>> >> The problem is, that if no one answers very soon, the question will
>>> almost
>>> >> certainly remain unanswered because new messages will simply drown it.
>>> >>
>>> >>
>>> >>
>>> >> This is a common issue not just for questions but for any comment or
>>> idea
>>> >> which is not immediately picked up.
>>> >>
>>> >>
>>> >>
>>> >> I believe we should have a method of handling this.
>>> >>
>>> >> Generally, I would say these types of things belong in stack overflow,
>>> >> after all, the way it is built is perfect for this. More seasoned
>>> spark
>>> >> contributors and committers can periodically check out unanswered
>>> questions
>>> >> and answer them.
>>> >>
>>> >> The problem is that stack overflow (as well as other targets such as
>>> the
>>> >> databricks forums) tend to have a more user based orientation. This
>>> means
>>> >> that any spark internal question will almost certainly remain
>>> unanswered.
>>> >>
>>> >>
>>> >>
>>> >> I was wondering if we could come up with a solution for this.
>>> >>
>>> >>
>>> >>
>>> >> Assaf.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> ________________________________
>>> >>
>>> >> View this message in context: Handling questions in the mailing lists
>>> >> Sent from the Apache Spark Developers List mailing list archive at
>>> >> Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','dev-unsubscribe@spark.apache.org');>
>>>
>>>
>>

Re: Handling questions in the mailing lists

Posted by Nicholas Chammas <ni...@gmail.com>.
We’ve discussed several times upgrading our communication tools, as far
back as 2014 and maybe even before that too. The bottom line is that we
can’t due to ASF rules requiring the use of ASF-managed mailing lists.

For some history, see this discussion:

   -
   https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E
   -
   https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E

(It’s ironic that it’s difficult to follow the past discussion on why we
can’t change our official communication tools due to those very tools…)

Nick
​

On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <
ricardo.almeida@actnowib.com> wrote:

> I fell Assaf point is quite relevant if we want to move this project
> forward from the Spark user perspective (as I do). In fact, we're still
> using 20th century tools (mailing lists) with some add-ons (like Stack
> Overflow).
>
> As usually, Sean and Cody's contributions are very to the point.
> I fell it is indeed a matter of of culture (hard to enforce) and tools
> (much easier). Isn't it?
>
> On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:
>
> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
> Assaf.Mendelson@rsa.com>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
> >> From: Sean Owen [mailto:sowen@cloudera.com]
> >> Sent: Wednesday, November 02, 2016 2:07 PM
> >> To: Mendelson, Assaf; dev@spark.apache.org
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
> assaf.mendelson@rsa.com>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>
>

Re: Handling questions in the mailing lists

Posted by Ricardo Almeida <ri...@actnowib.com>.
I fell Assaf point is quite relevant if we want to move this project
forward from the Spark user perspective (as I do). In fact, we're still
using 20th century tools (mailing lists) with some add-ons (like Stack
Overflow).

As usually, Sean and Cody's contributions are very to the point.
I fell it is indeed a matter of of culture (hard to enforce) and tools
(much easier). Isn't it?

On 2 November 2016 at 16:36, Cody Koeninger <co...@koeninger.org> wrote:

> So concrete things people could do
>
> - users could tag subject lines appropriately to the component they're
> asking about
>
> - contributors could monitor user@ for tags relating to components
> they've worked on.
> I'd be surprised if my miss rate for any mailing list questions
> well-labeled as Kafka was higher than 5%
>
> - committers could be more aggressive about soliciting and merging PRs
> to improve documentation.
> It's a lot easier to answer even poorly-asked questions with a link to
> relevant docs.
>
> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
> > There's already reviews@ and issues@. dev@ is for project development
> itself
> > and I think is OK. You're suggesting splitting up user@ and I sympathize
> > with the motivation. Experience tells me that we'll have a beginner@
> that's
> > then totally ignored, and people will quickly learn to post to advanced@
> to
> > get attention, and we'll be back where we started. Putting it in JIRA
> > doesn't help. I don't think this a problem that is merely down to lack of
> > process. It actually requires cultivating a culture change on the
> community
> > list.
> >
> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <
> Assaf.Mendelson@rsa.com>
> > wrote:
> >>
> >> What I am suggesting is basically to fix that.
> >>
> >> For example, we might say that mailing list A is only for voting,
> mailing
> >> list B is only for PR and have something like stack overflow for
> developer
> >> questions (I would even go as far as to have beginner, intermediate and
> >> advanced mailing list for users and beginner/advanced for dev).
> >>
> >>
> >>
> >> This can easily be done using stack overflow tags, however, that would
> >> probably be harder to manage.
> >>
> >> Maybe using special jira tags and manage it in jira?
> >>
> >>
> >>
> >> Anyway as I said, the main issue is not user questions (except maybe
> >> advanced ones) but more for dev questions. It is so easy to get lost in
> the
> >> chatter that it makes it very hard for people to learn spark internals…
> >>
> >> Assaf.
> >>
> >>
> >>
> >> From: Sean Owen [mailto:sowen@cloudera.com]
> >> Sent: Wednesday, November 02, 2016 2:07 PM
> >> To: Mendelson, Assaf; dev@spark.apache.org
> >> Subject: Re: Handling questions in the mailing lists
> >>
> >>
> >>
> >> I think that unfortunately mailing lists don't scale well. This one has
> >> thousands of subscribers with different interests and levels of
> experience.
> >> For any given person, most messages will be irrelevant. I also find
> that a
> >> lot of questions on user@ are not well-asked, aren't an SSCCE
> >> (http://sscce.org/), not something most people are going to bother
> replying
> >> to even if they could answer. I almost entirely ignore user@ because
> there
> >> are higher-priority channels like PRs to deal with, that already have
> >> hundreds of messages per day. This is why little of it gets an answer
> -- too
> >> noisy.
> >>
> >>
> >>
> >> We have to have official mailing lists, in any event, to have some
> >> official channel for things like votes and announcements. It's not
> wrong to
> >> ask questions on user@ of course, but a lot of the questions I see
> could
> >> have been answered with research of existing docs or looking at the
> code. I
> >> think that given the scale of the list, it's not wrong to assert that
> this
> >> is sort of a prerequisite for asking thousands of people to answer one's
> >> question. But we can't enforce that.
> >>
> >>
> >>
> >> The situation will get better to the extent people ask better questions,
> >> help other people ask better questions, and answer good questions. I'd
> >> encourage anyone feeling this way to try to help along those dimensions.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <
> assaf.mendelson@rsa.com>
> >> wrote:
> >>
> >> Hi,
> >>
> >> I know this is a little off topic but I wanted to raise an issue about
> >> handling questions in the mailing list (this is true both for the user
> >> mailing list and the dev but since there are other options such as stack
> >> overflow for user questions, this is more problematic in dev).
> >>
> >> Let’s say I ask a question (as I recently did). Unfortunately this was
> >> during spark summit in Europe so probably people were busy. In any case
> no
> >> one answered.
> >>
> >> The problem is, that if no one answers very soon, the question will
> almost
> >> certainly remain unanswered because new messages will simply drown it.
> >>
> >>
> >>
> >> This is a common issue not just for questions but for any comment or
> idea
> >> which is not immediately picked up.
> >>
> >>
> >>
> >> I believe we should have a method of handling this.
> >>
> >> Generally, I would say these types of things belong in stack overflow,
> >> after all, the way it is built is perfect for this. More seasoned spark
> >> contributors and committers can periodically check out unanswered
> questions
> >> and answer them.
> >>
> >> The problem is that stack overflow (as well as other targets such as the
> >> databricks forums) tend to have a more user based orientation. This
> means
> >> that any spark internal question will almost certainly remain
> unanswered.
> >>
> >>
> >>
> >> I was wondering if we could come up with a solution for this.
> >>
> >>
> >>
> >> Assaf.
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> View this message in context: Handling questions in the mailing lists
> >> Sent from the Apache Spark Developers List mailing list archive at
> >> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Handling questions in the mailing lists

Posted by Cody Koeninger <co...@koeninger.org>.
So concrete things people could do

- users could tag subject lines appropriately to the component they're
asking about

- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was higher than 5%

- committers could be more aggressive about soliciting and merging PRs
to improve documentation.
It's a lot easier to answer even poorly-asked questions with a link to
relevant docs.

On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote:
> There's already reviews@ and issues@. dev@ is for project development itself
> and I think is OK. You're suggesting splitting up user@ and I sympathize
> with the motivation. Experience tells me that we'll have a beginner@ that's
> then totally ignored, and people will quickly learn to post to advanced@ to
> get attention, and we'll be back where we started. Putting it in JIRA
> doesn't help. I don't think this a problem that is merely down to lack of
> process. It actually requires cultivating a culture change on the community
> list.
>
> On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <As...@rsa.com>
> wrote:
>>
>> What I am suggesting is basically to fix that.
>>
>> For example, we might say that mailing list A is only for voting, mailing
>> list B is only for PR and have something like stack overflow for developer
>> questions (I would even go as far as to have beginner, intermediate and
>> advanced mailing list for users and beginner/advanced for dev).
>>
>>
>>
>> This can easily be done using stack overflow tags, however, that would
>> probably be harder to manage.
>>
>> Maybe using special jira tags and manage it in jira?
>>
>>
>>
>> Anyway as I said, the main issue is not user questions (except maybe
>> advanced ones) but more for dev questions. It is so easy to get lost in the
>> chatter that it makes it very hard for people to learn spark internals…
>>
>> Assaf.
>>
>>
>>
>> From: Sean Owen [mailto:sowen@cloudera.com]
>> Sent: Wednesday, November 02, 2016 2:07 PM
>> To: Mendelson, Assaf; dev@spark.apache.org
>> Subject: Re: Handling questions in the mailing lists
>>
>>
>>
>> I think that unfortunately mailing lists don't scale well. This one has
>> thousands of subscribers with different interests and levels of experience.
>> For any given person, most messages will be irrelevant. I also find that a
>> lot of questions on user@ are not well-asked, aren't an SSCCE
>> (http://sscce.org/), not something most people are going to bother replying
>> to even if they could answer. I almost entirely ignore user@ because there
>> are higher-priority channels like PRs to deal with, that already have
>> hundreds of messages per day. This is why little of it gets an answer -- too
>> noisy.
>>
>>
>>
>> We have to have official mailing lists, in any event, to have some
>> official channel for things like votes and announcements. It's not wrong to
>> ask questions on user@ of course, but a lot of the questions I see could
>> have been answered with research of existing docs or looking at the code. I
>> think that given the scale of the list, it's not wrong to assert that this
>> is sort of a prerequisite for asking thousands of people to answer one's
>> question. But we can't enforce that.
>>
>>
>>
>> The situation will get better to the extent people ask better questions,
>> help other people ask better questions, and answer good questions. I'd
>> encourage anyone feeling this way to try to help along those dimensions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <as...@rsa.com>
>> wrote:
>>
>> Hi,
>>
>> I know this is a little off topic but I wanted to raise an issue about
>> handling questions in the mailing list (this is true both for the user
>> mailing list and the dev but since there are other options such as stack
>> overflow for user questions, this is more problematic in dev).
>>
>> Let’s say I ask a question (as I recently did). Unfortunately this was
>> during spark summit in Europe so probably people were busy. In any case no
>> one answered.
>>
>> The problem is, that if no one answers very soon, the question will almost
>> certainly remain unanswered because new messages will simply drown it.
>>
>>
>>
>> This is a common issue not just for questions but for any comment or idea
>> which is not immediately picked up.
>>
>>
>>
>> I believe we should have a method of handling this.
>>
>> Generally, I would say these types of things belong in stack overflow,
>> after all, the way it is built is perfect for this. More seasoned spark
>> contributors and committers can periodically check out unanswered questions
>> and answer them.
>>
>> The problem is that stack overflow (as well as other targets such as the
>> databricks forums) tend to have a more user based orientation. This means
>> that any spark internal question will almost certainly remain unanswered.
>>
>>
>>
>> I was wondering if we could come up with a solution for this.
>>
>>
>>
>> Assaf.
>>
>>
>>
>>
>>
>> ________________________________
>>
>> View this message in context: Handling questions in the mailing lists
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Handling questions in the mailing lists

Posted by Sean Owen <so...@cloudera.com>.
There's already reviews@ and issues@. dev@ is for project development
itself and I think is OK. You're suggesting splitting up user@ and I
sympathize with the motivation. Experience tells me that we'll have a
beginner@ that's then totally ignored, and people will quickly learn to
post to advanced@ to get attention, and we'll be back where we started.
Putting it in JIRA doesn't help. I don't think this a problem that is
merely down to lack of process. It actually requires cultivating a culture
change on the community list.

On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <As...@rsa.com>
wrote:

> What I am suggesting is basically to fix that.
>
> For example, we might say that mailing list A is only for voting, mailing
> list B is only for PR and have something like stack overflow for developer
> questions (I would even go as far as to have beginner, intermediate and
> advanced mailing list for users and beginner/advanced for dev).
>
>
>
> This can easily be done using stack overflow tags, however, that would
> probably be harder to manage.
>
> Maybe using special jira tags and manage it in jira?
>
>
>
> Anyway as I said, the main issue is not user questions (except maybe
> advanced ones) but more for dev questions. It is so easy to get lost in the
> chatter that it makes it very hard for people to learn spark internals…
>
> Assaf.
>
>
>
> *From:* Sean Owen [mailto:sowen@cloudera.com]
> *Sent:* Wednesday, November 02, 2016 2:07 PM
> *To:* Mendelson, Assaf; dev@spark.apache.org
> *Subject:* Re: Handling questions in the mailing lists
>
>
>
> I think that unfortunately mailing lists don't scale well. This one has
> thousands of subscribers with different interests and levels of experience.
> For any given person, most messages will be irrelevant. I also find that a
> lot of questions on user@ are not well-asked, aren't an SSCCE (
> http://sscce.org/), not something most people are going to bother
> replying to even if they could answer. I almost entirely ignore user@
> because there are higher-priority channels like PRs to deal with, that
> already have hundreds of messages per day. This is why little of it gets an
> answer -- too noisy.
>
>
>
> We have to have official mailing lists, in any event, to have some
> official channel for things like votes and announcements. It's not wrong to
> ask questions on user@ of course, but a lot of the questions I see could
> have been answered with research of existing docs or looking at the code. I
> think that given the scale of the list, it's not wrong to assert that this
> is sort of a prerequisite for asking thousands of people to answer one's
> question. But we can't enforce that.
>
>
>
> The situation will get better to the extent people ask better questions,
> help other people ask better questions, and answer good questions. I'd
> encourage anyone feeling this way to try to help along those dimensions.
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <as...@rsa.com>
> wrote:
>
> Hi,
>
> I know this is a little off topic but I wanted to raise an issue about
> handling questions in the mailing list (this is true both for the user
> mailing list and the dev but since there are other options such as stack
> overflow for user questions, this is more problematic in dev).
>
> Let’s say I ask a question (as I recently did). Unfortunately this was
> during spark summit in Europe so probably people were busy. In any case no
> one answered.
>
> The problem is, that if no one answers very soon, the question will almost
> certainly remain unanswered because new messages will simply drown it.
>
>
>
> This is a common issue not just for questions but for any comment or idea
> which is not immediately picked up.
>
>
>
> I believe we should have a method of handling this.
>
> Generally, I would say these types of things belong in stack overflow,
> after all, the way it is built is perfect for this. More seasoned spark
> contributors and committers can periodically check out unanswered questions
> and answer them.
>
> The problem is that stack overflow (as well as other targets such as the
> databricks forums) tend to have a more user based orientation. This means
> that any spark internal question will almost certainly remain unanswered.
>
>
>
> I was wondering if we could come up with a solution for this.
>
>
>
> Assaf.
>
>
>
>
> ------------------------------
>
> View this message in context: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>
>

RE: Handling questions in the mailing lists

Posted by "Mendelson, Assaf" <As...@rsa.com>.
What I am suggesting is basically to fix that.
For example, we might say that mailing list A is only for voting, mailing list B is only for PR and have something like stack overflow for developer questions (I would even go as far as to have beginner, intermediate and advanced mailing list for users and beginner/advanced for dev).

This can easily be done using stack overflow tags, however, that would probably be harder to manage.
Maybe using special jira tags and manage it in jira?

Anyway as I said, the main issue is not user questions (except maybe advanced ones) but more for dev questions. It is so easy to get lost in the chatter that it makes it very hard for people to learn spark internals…
Assaf.

From: Sean Owen [mailto:sowen@cloudera.com]
Sent: Wednesday, November 02, 2016 2:07 PM
To: Mendelson, Assaf; dev@spark.apache.org
Subject: Re: Handling questions in the mailing lists

I think that unfortunately mailing lists don't scale well. This one has thousands of subscribers with different interests and levels of experience. For any given person, most messages will be irrelevant. I also find that a lot of questions on user@ are not well-asked, aren't an SSCCE (http://sscce.org/), not something most people are going to bother replying to even if they could answer. I almost entirely ignore user@ because there are higher-priority channels like PRs to deal with, that already have hundreds of messages per day. This is why little of it gets an answer -- too noisy.

We have to have official mailing lists, in any event, to have some official channel for things like votes and announcements. It's not wrong to ask questions on user@ of course, but a lot of the questions I see could have been answered with research of existing docs or looking at the code. I think that given the scale of the list, it's not wrong to assert that this is sort of a prerequisite for asking thousands of people to answer one's question. But we can't enforce that.

The situation will get better to the extent people ask better questions, help other people ask better questions, and answer good questions. I'd encourage anyone feeling this way to try to help along those dimensions.





On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <as...@rsa.com>> wrote:
Hi,
I know this is a little off topic but I wanted to raise an issue about handling questions in the mailing list (this is true both for the user mailing list and the dev but since there are other options such as stack overflow for user questions, this is more problematic in dev).
Let’s say I ask a question (as I recently did). Unfortunately this was during spark summit in Europe so probably people were busy. In any case no one answered.
The problem is, that if no one answers very soon, the question will almost certainly remain unanswered because new messages will simply drown it.

This is a common issue not just for questions but for any comment or idea which is not immediately picked up.

I believe we should have a method of handling this.
Generally, I would say these types of things belong in stack overflow, after all, the way it is built is perfect for this. More seasoned spark contributors and committers can periodically check out unanswered questions and answer them.
The problem is that stack overflow (as well as other targets such as the databricks forums) tend to have a more user based orientation. This means that any spark internal question will almost certainly remain unanswered.

I was wondering if we could come up with a solution for this.

Assaf.


________________________________
View this message in context: Handling questions in the mailing lists<http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html>
Sent from the Apache Spark Developers List mailing list archive<http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.

Re: Handling questions in the mailing lists

Posted by Sean Owen <so...@cloudera.com>.
I think that unfortunately mailing lists don't scale well. This one has
thousands of subscribers with different interests and levels of experience.
For any given person, most messages will be irrelevant. I also find that a
lot of questions on user@ are not well-asked, aren't an SSCCE (
http://sscce.org/), not something most people are going to bother replying
to even if they could answer. I almost entirely ignore user@ because there
are higher-priority channels like PRs to deal with, that already have
hundreds of messages per day. This is why little of it gets an answer --
too noisy.

We have to have official mailing lists, in any event, to have some official
channel for things like votes and announcements. It's not wrong to ask
questions on user@ of course, but a lot of the questions I see could have
been answered with research of existing docs or looking at the code. I
think that given the scale of the list, it's not wrong to assert that this
is sort of a prerequisite for asking thousands of people to answer one's
question. But we can't enforce that.

The situation will get better to the extent people ask better questions,
help other people ask better questions, and answer good questions. I'd
encourage anyone feeling this way to try to help along those dimensions.





On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <as...@rsa.com>
wrote:

> Hi,
>
> I know this is a little off topic but I wanted to raise an issue about
> handling questions in the mailing list (this is true both for the user
> mailing list and the dev but since there are other options such as stack
> overflow for user questions, this is more problematic in dev).
>
> Let’s say I ask a question (as I recently did). Unfortunately this was
> during spark summit in Europe so probably people were busy. In any case no
> one answered.
>
> The problem is, that if no one answers very soon, the question will almost
> certainly remain unanswered because new messages will simply drown it.
>
>
>
> This is a common issue not just for questions but for any comment or idea
> which is not immediately picked up.
>
>
>
> I believe we should have a method of handling this.
>
> Generally, I would say these types of things belong in stack overflow,
> after all, the way it is built is perfect for this. More seasoned spark
> contributors and committers can periodically check out unanswered questions
> and answer them.
>
> The problem is that stack overflow (as well as other targets such as the
> databricks forums) tend to have a more user based orientation. This means
> that any spark internal question will almost certainly remain unanswered.
>
>
>
> I was wondering if we could come up with a solution for this.
>
>
>
> Assaf.
>
>
>
> ------------------------------
> View this message in context: Handling questions in the mailing lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>