You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ja...@apache.org on 2021/06/09 05:06:16 UTC

[systemds-website] branch master updated: [SYSTEMDS-2995] Provide datasets via our Apache SystemDS hompage

This is an automated email from the ASF dual-hosted git repository.

janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 839d229  [SYSTEMDS-2995] Provide datasets via our Apache SystemDS hompage
839d229 is described below

commit 839d229efa08ff703db5d3b6fd0a05ced9e93e8c
Author: Janardhan Pulivarthi <j1...@protonmail.com>
AuthorDate: Wed Jun 9 10:35:53 2021 +0530

    [SYSTEMDS-2995] Provide datasets via our Apache SystemDS hompage
    
    - Add datasets with Jekyll collections framework
    - Add layout for dataset page for individual file generation per
      dataset for easy modification with markdown
    - Uploaded datasets from the http://yann.lecun.com/exdb/mnist/
      with `wget` command. There are not manually downloaded via browser
      to avoid any processing on the fly.
    
    - add dataset overview page
    
    Closes #86
---
 _config.yml                                        |   8 ++++++
 _src/_datasets/mnist.md                            |  29 +++++++++++++++++++++
 _src/_layouts/datasets.html                        |  19 ++++++++++++++
 .../datasets/mnist/t10k-images-idx3-ubyte.gz       | Bin 0 -> 1648877 bytes
 .../datasets/mnist/t10k-labels-idx1-ubyte.gz       | Bin 0 -> 4542 bytes
 .../datasets/mnist/train-images-idx3-ubyte.gz      | Bin 0 -> 9912422 bytes
 .../datasets/mnist/train-labels-idx1-ubyte.gz      | Bin 0 -> 28881 bytes
 _src/assets/img/datasets.png                       | Bin 0 -> 935840 bytes
 _src/datasets.html                                 |  17 ++++++++++++
 9 files changed, 73 insertions(+)

diff --git a/_config.yml b/_config.yml
index 8448d03..19b9d19 100644
--- a/_config.yml
+++ b/_config.yml
@@ -26,3 +26,11 @@ source: _src
 exclude:
   - _sass
   - _scripts
+
+# https://jekyllrb.com/docs/collections/
+collections:
+  datasets:
+    output: true
+
+
+
diff --git a/_src/_datasets/mnist.md b/_src/_datasets/mnist.md
new file mode 100644
index 0000000..2f04c16
--- /dev/null
+++ b/_src/_datasets/mnist.md
@@ -0,0 +1,29 @@
+---
+layout: datasets
+title: MNIST Dataset
+description: The MNIST database of handwritten digits.
+link: mnist
+---
+
+The MNIST database of handwritten digits, available from this page,
+has a training set of 60,000 examples, and a test set of 10,000 examples.
+It is a subset of a larger set available from NIST. The digits have been
+size-normalized and centered in a fixed-size image.
+
+Home Page: [http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)
+
+Download Size: `11.06 MiB`
+
+### Files
+
+1. [`train-images-idx3-ubyte.gz`:  training set images (9912422 bytes)](https://systemds.apache.org/assets/datasets/mnist/train-images-idx3-ubyte.gz)
+2. [`train-labels-idx1-ubyte.gz`:  training set labels (28881 bytes)](https://systemds.apache.org/assets/datasets/mnist/train-labels-idx1-ubyte.gz)
+3. [`t10k-images-idx3-ubyte.gz`:   test set images (1648877 bytes)](https://systemds.apache.org/assets/datasets/mnist/t10k-images-idx3-ubyte.gz)
+4. [`t10k-labels-idx1-ubyte.gz`:   test set labels (4542 bytes)](https://systemds.apache.org/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz)
+
+### Split:
+
+| Split | Examples |
+| --- | --- |
+| `train` | 60,000 |
+| `test` | 10,000 |
diff --git a/_src/_layouts/datasets.html b/_src/_layouts/datasets.html
new file mode 100644
index 0000000..a0b41ed
--- /dev/null
+++ b/_src/_layouts/datasets.html
@@ -0,0 +1,19 @@
+---
+layout: default
+---
+<!-- https://jekyllrb.com/docs/step-by-step/09-collections/#configuration -->
+<!-- <h1>{{ page.title }}</h1> -->
+<section class="full-stripe full-stripe--alternate">
+  <div class="ml-container ml-container--vertically-centered ">
+    <div class="col col-6 content-group content-group--more-padding">
+      <img src="/assets/img/datasets.png" alt="Datasets for SystemDS">
+    </div>
+    <div class="col col-6 content-group content-group--more-padding button-group">
+      <h2>{{ page.title }}</h2>
+      <h4>{{ page.description }}</h4>
+
+      {{content}}
+
+    </div>
+  </div>
+</section>
diff --git a/_src/assets/datasets/mnist/t10k-images-idx3-ubyte.gz b/_src/assets/datasets/mnist/t10k-images-idx3-ubyte.gz
new file mode 100644
index 0000000..5ace8ea
Binary files /dev/null and b/_src/assets/datasets/mnist/t10k-images-idx3-ubyte.gz differ
diff --git a/_src/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz b/_src/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz
new file mode 100644
index 0000000..a7e1415
Binary files /dev/null and b/_src/assets/datasets/mnist/t10k-labels-idx1-ubyte.gz differ
diff --git a/_src/assets/datasets/mnist/train-images-idx3-ubyte.gz b/_src/assets/datasets/mnist/train-images-idx3-ubyte.gz
new file mode 100644
index 0000000..b50e4b6
Binary files /dev/null and b/_src/assets/datasets/mnist/train-images-idx3-ubyte.gz differ
diff --git a/_src/assets/datasets/mnist/train-labels-idx1-ubyte.gz b/_src/assets/datasets/mnist/train-labels-idx1-ubyte.gz
new file mode 100644
index 0000000..707a576
Binary files /dev/null and b/_src/assets/datasets/mnist/train-labels-idx1-ubyte.gz differ
diff --git a/_src/assets/img/datasets.png b/_src/assets/img/datasets.png
new file mode 100644
index 0000000..4c5969b
Binary files /dev/null and b/_src/assets/img/datasets.png differ
diff --git a/_src/datasets.html b/_src/datasets.html
new file mode 100644
index 0000000..131d7b3
--- /dev/null
+++ b/_src/datasets.html
@@ -0,0 +1,17 @@
+---
+layout: datasets
+title: Dataset catalog
+description: The datasets used for working with SystemDS
+---
+
+<ul>
+  {% for dataset in site.datasets %}
+    <li>
+      <section id="datasets-list">
+        <h4><a href="https://systemds.apache.org/datasets/{{ dataset.link }}">{{ dataset.title }}</a></h4>
+        <p>{{ dataset.description }}</p>
+      </section>
+
+    </li>
+  {% endfor %}
+</ul>
\ No newline at end of file