You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ja...@apache.org on 2021/06/09 05:06:16 UTC
[systemds-website] branch master updated: [SYSTEMDS-2995] Provide
datasets via our Apache SystemDS hompage
This is an automated email from the ASF dual-hosted git repository.
janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds-website.git
The following commit(s) were added to refs/heads/master by this push:
new 839d229 [SYSTEMDS-2995] Provide datasets via our Apache SystemDS hompage
839d229 is described below
commit 839d229efa08ff703db5d3b6fd0a05ced9e93e8c
Author: Janardhan Pulivarthi <j1...@protonmail.com>
AuthorDate: Wed Jun 9 10:35:53 2021 +0530
[SYSTEMDS-2995] Provide datasets via our Apache SystemDS hompage
- Add datasets with Jekyll collections framework
- Add layout for dataset page for individual file generation per
dataset for easy modification with markdown
- Uploaded datasets from the http://yann.lecun.com/exdb/mnist/
with `wget` command. There are not manually downloaded via browser
to avoid any processing on the fly.
- add dataset overview page
Closes #86
---
_config.yml | 8 ++++++
_src/_datasets/mnist.md | 29 +++++++++++++++++++++
_src/_layouts/datasets.html | 19 ++++++++++++++
.../datasets/mnist/t10k-images-idx3-ubyte.gz | Bin 0 -> 1648877 bytes
.../datasets/mnist/t10k-labels-idx1-ubyte.gz | Bin 0 -> 4542 bytes
.../datasets/mnist/train-images-idx3-ubyte.gz | Bin 0 -> 9912422 bytes
.../datasets/mnist/train-labels-idx1-ubyte.gz | Bin 0 -> 28881 bytes
_src/assets/img/datasets.png | Bin 0 -> 935840 bytes
_src/datasets.html | 17 ++++++++++++
9 files changed, 73 insertions(+)
diff --git a/_config.yml b/_config.yml
index 8448d03..19b9d19 100644
--- a/_config.yml
+++ b/_config.yml
@@ -26,3 +26,11 @@ source: _src
exclude:
- _sass
- _scripts
+
+# https://jekyllrb.com/docs/collections/
+collections:
+ datasets:
+ output: true
+
+
+
diff --git a/_src/_datasets/mnist.md b/_src/_datasets/mnist.md
new file mode 100644
index 0000000..2f04c16
--- /dev/null
+++ b/_src/_datasets/mnist.md
@@ -0,0 +1,29 @@
+---
+layout: datasets
+title: MNIST Dataset
+description: The MNIST database of handwritten digits.
+link: mnist
+---
+
+The MNIST database of handwritten digits, available from this page,
+has a training set of 60,000 examples, and a test set of 10,000 examples.
+It is a subset of a larger set available from NIST. The digits have been
+size-normalized and centered in a fixed-size image.
+
+Home Page: [http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)
+
+Download Size: `11.06 MiB`
+
+### Files
+
+1. [`train-images-idx3-ubyte.gz`: training set images (9912422 bytes)](https://systemds.apache.org/assets/datasets/mnist/train-images-idx3-ubyte.gz)
+2. [`train-labels-idx1-ubyte.gz`: training set labels (28881 bytes)](https://systemds.apache.org/assets/datasets/mnist/train-labels-idx1-ubyte.gz)
+3. [`t10k-images-idx3-ubyte.gz`: test set images (1648877 bytes)](https://systemds.apache.org/assets/datasets/mnist/t10k-images-idx3-ubyte.gz)
+4. [`t10k-labels-idx1-ubyte.gz`: test set labels (4542 bytes)](https://systemds.apache.org/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz)
+
+### Split:
+
+| Split | Examples |
+| --- | --- |
+| `train` | 60,000 |
+| `test` | 10,000 |
diff --git a/_src/_layouts/datasets.html b/_src/_layouts/datasets.html
new file mode 100644
index 0000000..a0b41ed
--- /dev/null
+++ b/_src/_layouts/datasets.html
@@ -0,0 +1,19 @@
+---
+layout: default
+---
+<!-- https://jekyllrb.com/docs/step-by-step/09-collections/#configuration -->
+<!-- <h1>{{ page.title }}</h1> -->
+<section class="full-stripe full-stripe--alternate">
+ <div class="ml-container ml-container--vertically-centered ">
+ <div class="col col-6 content-group content-group--more-padding">
+ <img src="/assets/img/datasets.png" alt="Datasets for SystemDS">
+ </div>
+ <div class="col col-6 content-group content-group--more-padding button-group">
+ <h2>{{ page.title }}</h2>
+ <h4>{{ page.description }}</h4>
+
+ {{content}}
+
+ </div>
+ </div>
+</section>
diff --git a/_src/assets/datasets/mnist/t10k-images-idx3-ubyte.gz b/_src/assets/datasets/mnist/t10k-images-idx3-ubyte.gz
new file mode 100644
index 0000000..5ace8ea
Binary files /dev/null and b/_src/assets/datasets/mnist/t10k-images-idx3-ubyte.gz differ
diff --git a/_src/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz b/_src/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz
new file mode 100644
index 0000000..a7e1415
Binary files /dev/null and b/_src/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz differ
diff --git a/_src/assets/datasets/mnist/train-images-idx3-ubyte.gz b/_src/assets/datasets/mnist/train-images-idx3-ubyte.gz
new file mode 100644
index 0000000..b50e4b6
Binary files /dev/null and b/_src/assets/datasets/mnist/train-images-idx3-ubyte.gz differ
diff --git a/_src/assets/datasets/mnist/train-labels-idx1-ubyte.gz b/_src/assets/datasets/mnist/train-labels-idx1-ubyte.gz
new file mode 100644
index 0000000..707a576
Binary files /dev/null and b/_src/assets/datasets/mnist/train-labels-idx1-ubyte.gz differ
diff --git a/_src/assets/img/datasets.png b/_src/assets/img/datasets.png
new file mode 100644
index 0000000..4c5969b
Binary files /dev/null and b/_src/assets/img/datasets.png differ
diff --git a/_src/datasets.html b/_src/datasets.html
new file mode 100644
index 0000000..131d7b3
--- /dev/null
+++ b/_src/datasets.html
@@ -0,0 +1,17 @@
+---
+layout: datasets
+title: Dataset catalog
+description: The datasets used for working with SystemDS
+---
+
+<ul>
+ {% for dataset in site.datasets %}
+ <li>
+ <section id="datasets-list">
+ <h4><a href="https://systemds.apache.org/datasets/{{ dataset.link }}">{{ dataset.title }}</a></h4>
+ <p>{{ dataset.description }}</p>
+ </section>
+
+ </li>
+ {% endfor %}
+</ul>
\ No newline at end of file