You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Saikat Kanjilal (JIRA)" <ji...@apache.org> on 2016/04/02 04:05:25 UTC

[jira] [Comment Edited] (SPARK-14302) Python examples code merge and clean up

    [ https://issues.apache.org/jira/browse/SPARK-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222649#comment-15222649 ] 

Saikat Kanjilal edited comment on SPARK-14302 at 4/2/16 2:04 AM:
-----------------------------------------------------------------

next question, which of the directories should contain the merged code, example I am looking at bisecting_k_means, the code in the two directories is very similar but one trains the model before throwing test data at the model, my recommendation would be to merge this code into 1 directory (either ml or mllb), so in general with my patch when I am merging code which of the directories should I put the result in?


was (Author: kanjilal):
next question, which of the directories should contain the merged code, example I am looking at bisecting_k_means, the code in the two directories is very similar but one trains the model before throwing test data, my recommendation would be to merge this code into 1 directory (either ml or mllb), so in general with my patch when I am merging code which of the directories should I put the result in?

> Python examples code merge and clean up
> ---------------------------------------
>
>                 Key: SPARK-14302
>                 URL: https://issues.apache.org/jira/browse/SPARK-14302
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Examples
>            Reporter: Xusen Yin
>            Priority: Minor
>              Labels: starter
>
> Duplicated code that I found in python/examples/mllib and python/examples/ml:
> * python/ml
> ** None
> * Unsure duplications, double check
> ** dataframe_example.py
> ** kmeans_example.py
> ** simple_params_example.py
> ** simple_text_classification_pipeline.py
> * python/mllib
> ** gaussian_mixture_model.py
> ** kmeans.py
> ** logistic_regression.py
> * Unsure duplications, double check
> ** correlations.py
> ** random_rdd_generation.py
> ** sampled_rdds.py
> ** word2vec.py
> When merging and cleaning those code, be sure not disturb the previous example on and off blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org