You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Nick Pentreath <ni...@gmail.com> on 2016/04/14 14:28:34 UTC

Organizing Spark ML example packages

Hey Spark devs

I noticed that we now have a large number of examples for ML & MLlib in the
examples project - 57 for ML and 67 for MLLIB to be precise. This is bound
to get larger as we add features (though I know there are some PRs to clean
up duplicated examples).

What do you think about organizing them into packages to match the use case
and the structure of the code base? e.g.

org.apache.spark.examples.ml.recommendation

org.apache.spark.examples.ml.feature

and so on...

Is it worth doing? The doc pages with include_example would need updating,
and the run_example script input would just need to change the package
slightly. Did I miss any potential issue?

N

Re: Organizing Spark ML example packages

Posted by Stephen Boesch <ja...@gmail.com>.

Yes: will you have cycles to do it?

2016-09-12 9:09 GMT-07:00 Nick Pentreath <ni...@gmail.com>:

> Never actually got around to doing this - do folks still think it
> worthwhile?
>
> On Thu, 21 Apr 2016 at 00:10 Joseph Bradley <jo...@databricks.com> wrote:
>
>> Sounds good to me.  I'd request we be strict during this process about
>> requiring *no* changes to the example itself, which will make review easier.
>>
>> On Tue, Apr 19, 2016 at 11:12 AM, Bryan Cutler <cu...@gmail.com> wrote:
>>
>>> +1, adding some organization would make it easier for people to find a
>>> specific example
>>>
>>> On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang <yb...@gmail.com>
>>> wrote:
>>>
>>>> This sounds good to me, and it will make ML examples more neatly.
>>>>
>>>> 2016-04-14 5:28 GMT-07:00 Nick Pentreath <ni...@gmail.com>:
>>>>
>>>>> Hey Spark devs
>>>>>
>>>>> I noticed that we now have a large number of examples for ML & MLlib
>>>>> in the examples project - 57 for ML and 67 for MLLIB to be precise. This is
>>>>> bound to get larger as we add features (though I know there are some PRs to
>>>>> clean up duplicated examples).
>>>>>
>>>>> What do you think about organizing them into packages to match the use
>>>>> case and the structure of the code base? e.g.
>>>>>
>>>>> org.apache.spark.examples.ml.recommendation
>>>>>
>>>>> org.apache.spark.examples.ml.feature
>>>>>
>>>>> and so on...
>>>>>
>>>>> Is it worth doing? The doc pages with include_example would need
>>>>> updating, and the run_example script input would just need to change the
>>>>> package slightly. Did I miss any potential issue?
>>>>>
>>>>> N
>>>>>
>>>>
>>>>
>>>
>>

Re: Organizing Spark ML example packages

Posted by Nick Pentreath <ni...@gmail.com>.

Never actually got around to doing this - do folks still think it
worthwhile?

On Thu, 21 Apr 2016 at 00:10 Joseph Bradley <jo...@databricks.com> wrote:

> Sounds good to me.  I'd request we be strict during this process about
> requiring *no* changes to the example itself, which will make review easier.
>
> On Tue, Apr 19, 2016 at 11:12 AM, Bryan Cutler <cu...@gmail.com> wrote:
>
>> +1, adding some organization would make it easier for people to find a
>> specific example
>>
>> On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang <yb...@gmail.com> wrote:
>>
>>> This sounds good to me, and it will make ML examples more neatly.
>>>
>>> 2016-04-14 5:28 GMT-07:00 Nick Pentreath <ni...@gmail.com>:
>>>
>>>> Hey Spark devs
>>>>
>>>> I noticed that we now have a large number of examples for ML & MLlib in
>>>> the examples project - 57 for ML and 67 for MLLIB to be precise. This is
>>>> bound to get larger as we add features (though I know there are some PRs to
>>>> clean up duplicated examples).
>>>>
>>>> What do you think about organizing them into packages to match the use
>>>> case and the structure of the code base? e.g.
>>>>
>>>> org.apache.spark.examples.ml.recommendation
>>>>
>>>> org.apache.spark.examples.ml.feature
>>>>
>>>> and so on...
>>>>
>>>> Is it worth doing? The doc pages with include_example would need
>>>> updating, and the run_example script input would just need to change the
>>>> package slightly. Did I miss any potential issue?
>>>>
>>>> N
>>>>
>>>
>>>
>>
>

Re: Organizing Spark ML example packages

Posted by Joseph Bradley <jo...@databricks.com>.

Sounds good to me.  I'd request we be strict during this process about
requiring *no* changes to the example itself, which will make review easier.

On Tue, Apr 19, 2016 at 11:12 AM, Bryan Cutler <cu...@gmail.com> wrote:

> +1, adding some organization would make it easier for people to find a
> specific example
>
> On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang <yb...@gmail.com> wrote:
>
>> This sounds good to me, and it will make ML examples more neatly.
>>
>> 2016-04-14 5:28 GMT-07:00 Nick Pentreath <ni...@gmail.com>:
>>
>>> Hey Spark devs
>>>
>>> I noticed that we now have a large number of examples for ML & MLlib in
>>> the examples project - 57 for ML and 67 for MLLIB to be precise. This is
>>> bound to get larger as we add features (though I know there are some PRs to
>>> clean up duplicated examples).
>>>
>>> What do you think about organizing them into packages to match the use
>>> case and the structure of the code base? e.g.
>>>
>>> org.apache.spark.examples.ml.recommendation
>>>
>>> org.apache.spark.examples.ml.feature
>>>
>>> and so on...
>>>
>>> Is it worth doing? The doc pages with include_example would need
>>> updating, and the run_example script input would just need to change the
>>> package slightly. Did I miss any potential issue?
>>>
>>> N
>>>
>>
>>
>

Re: Organizing Spark ML example packages

Posted by Bryan Cutler <cu...@gmail.com>.

+1, adding some organization would make it easier for people to find a
specific example

On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang <yb...@gmail.com> wrote:

> This sounds good to me, and it will make ML examples more neatly.
>
> 2016-04-14 5:28 GMT-07:00 Nick Pentreath <ni...@gmail.com>:
>
>> Hey Spark devs
>>
>> I noticed that we now have a large number of examples for ML & MLlib in
>> the examples project - 57 for ML and 67 for MLLIB to be precise. This is
>> bound to get larger as we add features (though I know there are some PRs to
>> clean up duplicated examples).
>>
>> What do you think about organizing them into packages to match the use
>> case and the structure of the code base? e.g.
>>
>> org.apache.spark.examples.ml.recommendation
>>
>> org.apache.spark.examples.ml.feature
>>
>> and so on...
>>
>> Is it worth doing? The doc pages with include_example would need
>> updating, and the run_example script input would just need to change the
>> package slightly. Did I miss any potential issue?
>>
>> N
>>
>
>

Re: Organizing Spark ML example packages

Posted by Yanbo Liang <yb...@gmail.com>.

This sounds good to me, and it will make ML examples more neatly.

2016-04-14 5:28 GMT-07:00 Nick Pentreath <ni...@gmail.com>:

> Hey Spark devs
>
> I noticed that we now have a large number of examples for ML & MLlib in
> the examples project - 57 for ML and 67 for MLLIB to be precise. This is
> bound to get larger as we add features (though I know there are some PRs to
> clean up duplicated examples).
>
> What do you think about organizing them into packages to match the use
> case and the structure of the code base? e.g.
>
> org.apache.spark.examples.ml.recommendation
>
> org.apache.spark.examples.ml.feature
>
> and so on...
>
> Is it worth doing? The doc pages with include_example would need updating,
> and the run_example script input would just need to change the package
> slightly. Did I miss any potential issue?
>
> N
>