You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by me...@apache.org on 2014/07/14 04:28:04 UTC

[12/12] git commit: SPARK-2363. Clean MLlib's sample data files

SPARK-2363. Clean MLlib's sample data files

(Just made a PR for this, mengxr was the reporter of:)

MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib/data/*
Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.

Author: Sean Owen <so...@cloudera.com>

Closes #1394 from srowen/SPARK-2363 and squashes the following commits:

54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/635888cb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/635888cb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/635888cb

Branch: refs/heads/master
Commit: 635888cbed0e3f4127252fb84db449f0cc9ed659
Parents: 4c8be64
Author: Sean Owen <so...@cloudera.com>
Authored: Sun Jul 13 19:27:43 2014 -0700
Committer: Xiangrui Meng <me...@databricks.com>
Committed: Sun Jul 13 19:27:43 2014 -0700

----------------------------------------------------------------------
 data/kmeans_data.txt                   |    6 -
 data/lr_data.txt                       | 1000 ---------------------------
 data/mllib/als/test.data               |   16 +
 data/mllib/kmeans_data.txt             |    6 +
 data/mllib/lr-data/random.data         | 1000 +++++++++++++++++++++++++++
 data/mllib/lr_data.txt                 | 1000 +++++++++++++++++++++++++++
 data/mllib/pagerank_data.txt           |    6 +
 data/mllib/ridge-data/lpsa.data        |   67 ++
 data/mllib/sample_libsvm_data.txt      |  100 +++
 data/mllib/sample_naive_bayes_data.txt |    6 +
 data/mllib/sample_svm_data.txt         |  322 +++++++++
 data/mllib/sample_tree_data.csv        |  569 +++++++++++++++
 data/pagerank_data.txt                 |    6 -
 docs/bagel-programming-guide.md        |    2 +-
 docs/mllib-basics.md                   |    6 +-
 docs/mllib-clustering.md               |    4 +-
 docs/mllib-collaborative-filtering.md  |    4 +-
 docs/mllib-decision-tree.md            |    4 +-
 docs/mllib-linear-methods.md           |    8 +-
 docs/mllib-naive-bayes.md              |    2 +-
 docs/mllib-optimization.md             |    2 +-
 mllib/data/als/test.data               |   16 -
 mllib/data/lr-data/random.data         | 1000 ---------------------------
 mllib/data/ridge-data/lpsa.data        |   67 --
 mllib/data/sample_libsvm_data.txt      |  100 ---
 mllib/data/sample_naive_bayes_data.txt |    6 -
 mllib/data/sample_svm_data.txt         |  322 ---------
 mllib/data/sample_tree_data.csv        |  569 ---------------
 28 files changed, 3108 insertions(+), 3108 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/635888cb/data/kmeans_data.txt
----------------------------------------------------------------------
diff --git a/data/kmeans_data.txt b/data/kmeans_data.txt
deleted file mode 100644
index 338664f..0000000
--- a/data/kmeans_data.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-0.0 0.0 0.0
-0.1 0.1 0.1
-0.2 0.2 0.2
-9.0 9.0 9.0
-9.1 9.1 9.1
-9.2 9.2 9.2